dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Krek, Simon |
dc.date.accessioned | 2019-11-13T08:50:19Z |
dc.date.available | 2019-11-13T08:50:19Z |
dc.date.issued | 2019-11-18 |
dc.identifier.uri | http://hdl.handle.net/11356/1269 |
dc.description | Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their normalized forms, lemmas, and morphosyntactic tags along with their text-type distribution. In addition, four lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all lower-case word forms with their normalized word forms, lemmas, part-of-speech categories, and text-type distribution; 4) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns). |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1364 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | frequency list |
dc.subject | spoken corpus |
dc.subject | words |
dc.subject | lemmas |
dc.subject | normalized forms |
dc.title | Frequency lists of words from the GOS 1.0 corpus |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | hidden |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
sponsor | Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other |
files.count | 1 |
files.size | 4717179 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- GOS1.0-words.zip
- Size
- 4.5 MB
- Format
- application/zip
- Description
- Frequency lists of words in GOS1.0
- MD5
- eac2a3ff4a60fc7d26625591db22bfee
- GOS1.0-words-verbs
- GOS1.0-words-verbs-lemmas-taxonomy-entire.tsv2 MB
- GOS1.0-words-verbs-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv3 MB
- GOS1.0-words-prepositions
- GOS1.0-words-prepositions-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv62 kB
- GOS1.0-words-prepositions-lemmas-taxonomy-entire.tsv16 kB
- GOS1.0-words-interjections
- GOS1.0-words-interjections-lemmas-taxonomy-entire.tsv12 kB
- GOS1.0-words-interjections-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv20 kB
- GOS1.0-words-particles
- GOS1.0-words-particles-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv63 kB
- GOS1.0-words-particles-lemmas-taxonomy-entire.tsv16 kB
- GOS1.0-words-all
- GOS1.0-words-all-morphosyntactic_tags-split_MSD-taxonomy-entire.tsv172 kB
- GOS1.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-entire.tsv9 MB
- GOS1.0-words-all-lemmas-parts_of_speech-taxonomy-entire.tsv3 MB
- GOS1.0-words-all-lowercase_forms-normalized_forms-lemmas-parts_of_speech-taxonomy-entire.tsv10 MB
- GOS1.0-words-adjectives
- GOS1.0-words-adjectives-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv3 MB
- GOS1.0-words-adjectives-lemmas-taxonomy-entire.tsv1 MB
- GOS1.0-words-numerals
- GOS1.0-words-numerals-lemmas-taxonomy-entire.tsv92 kB
- GOS1.0-words-numerals-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv371 kB
- GOS1.0-words-nouns
- GOS1.0-words-nouns-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv5 MB
- GOS1.0-words-nouns-lemmas-taxonomy-entire.tsv3 MB
- GOS1.0-words-pronouns
- GOS1.0-words-pronouns-lemmas-taxonomy-entire.tsv84 kB
- GOS1.0-words-pronouns-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv587 kB
- GOS1.0-words-residual
- GOS1.0-words-residual-lemmas-taxonomy-entire.tsv1 MB
- GOS1.0-words-residual-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv1 MB
- GOS1.0-words-adverbs
- GOS1.0-words-adverbs-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv581 kB
- GOS1.0-words-adverbs-lemmas-taxonomy-entire.tsv280 kB
- GOS1.0-words-abbreviations
- GOS1.0-words-abbreviations-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv4 kB
- GOS1.0-words-abbreviations-lemmas-taxonomy-entire.tsv4 kB
- GOS1.0-words-conjunctions
- GOS1.0-words-conjunctions-lemmas-taxonomy-entire.tsv12 kB
- GOS1.0-words-conjunctions-lowercase_forms-normalized_forms-lemmas-morphosyntactic_tags-taxonomy-entire.tsv73 kB