Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2020-11-02T12:35:56Z
dc.date.available 2020-11-02T12:35:56Z
dc.date.issued 2020-10-28
dc.identifier.uri http://hdl.handle.net/11356/1364
dc.description Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their standardized forms, lemmas, and morphosyntactic tags along with their text-type distribution. In addition, four lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all lower-case word forms with their standardized word forms, lemmas, part-of-speech categories, and text-type distribution; 4) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns). Compared to the previous version (http://hdl.handle.net/11356/1269), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.replaces http://hdl.handle.net/11356/1269
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject frequency list
dc.subject spoken corpus
dc.subject words
dc.subject lemmas
dc.subject standardized forms
dc.title Frequency lists of words from the GOS 1.0 corpus 1.1
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
files.count 1
files.size 4717218


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GOS1.0-words.zip
Size
4.5 MB
Format
application/zip
Description
Frequency lists of words in GOS1.0
MD5
7cbe3b19360c2954c80bbb74cbc7968c
 Download file  Preview
 File Preview  
  • GOS1.0-words-verbs
    • GOS1.0-words-verbs-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv3 MB
    • GOS1.0-words-verbs-lemmas-taxonomy-entire.tsv2 MB
  • GOS1.0-words-prepositions
    • GOS1.0-words-prepositions-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv62 kB
    • GOS1.0-words-prepositions-lemmas-taxonomy-entire.tsv16 kB
  • GOS1.0-words-interjections
    • GOS1.0-words-interjections-lemmas-taxonomy-entire.tsv12 kB
    • GOS1.0-words-interjections-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv20 kB
  • GOS1.0-words-particles
    • GOS1.0-words-particles-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv63 kB
    • GOS1.0-words-particles-lemmas-taxonomy-entire.tsv16 kB
  • GOS1.0-words-all
    • GOS1.0-words-all-lowercase_forms-standardized_forms-lemmas-parts_of_speech-taxonomy-entire.tsv10 MB
    • GOS1.0-words-all-morphosyntactic_tags-split_MSD-taxonomy-entire.tsv172 kB
    • GOS1.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-entire.tsv9 MB
    • GOS1.0-words-all-lemmas-parts_of_speech-taxonomy-entire.tsv3 MB
  • GOS1.0-words-adjectives
    • GOS1.0-words-adjectives-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv3 MB
    • GOS1.0-words-adjectives-lemmas-taxonomy-entire.tsv1 MB
  • GOS1.0-words-numerals
    • GOS1.0-words-numerals-lemmas-taxonomy-entire.tsv92 kB
    • GOS1.0-words-numerals-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv371 kB
  • GOS1.0-words-nouns
    • GOS1.0-words-nouns-lemmas-taxonomy-entire.tsv3 MB
    • GOS1.0-words-nouns-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv5 MB
  • GOS1.0-words-pronouns
    • GOS1.0-words-pronouns-lemmas-taxonomy-entire.tsv84 kB
    • GOS1.0-words-pronouns-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv587 kB
  • GOS1.0-words-residual
    • GOS1.0-words-residual-lemmas-taxonomy-entire.tsv1 MB
    • GOS1.0-words-residual-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv1 MB
  • GOS1.0-words-adverbs
    • GOS1.0-words-adverbs-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv581 kB
    • GOS1.0-words-adverbs-lemmas-taxonomy-entire.tsv280 kB
  • GOS1.0-words-abbreviations
    • GOS1.0-words-abbreviations-lemmas-taxonomy-entire.tsv4 kB
    • GOS1.0-words-abbreviations-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv4 kB
  • GOS1.0-words-conjunctions
    • GOS1.0-words-conjunctions-lemmas-taxonomy-entire.tsv12 kB
    • GOS1.0-words-conjunctions-lowercase_forms-standardized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv73 kB

Show simple item record