dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Krek, Simon |
dc.date.accessioned | 2019-11-13T09:03:17Z |
dc.date.available | 2019-11-13T09:03:17Z |
dc.date.issued | 2019-11-18 |
dc.identifier.uri | http://hdl.handle.net/11356/1273 |
dc.description | Frequency lists of words were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their lemmas and morphosyntactic tags along with their text-type distribution. In addition, three lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns). |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | words |
dc.subject | lemmas |
dc.subject | morphosyntactic tags |
dc.title | Frequency lists of words from the Gigafida 2.0 corpus |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
files.count | 13 |
files.size | 283556149 |
Files in this item
Download all files in item (270.42 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- GF2.0-words-all.zip
- Size
- 110.5 MB
- Format
- application/zip
- Description
- Frequency lists of all words in Gigafida 2.0
- MD5
- b20a959f9c113aeb6504f0d753d36d10
- GF2.0-words-all-lemmas-parts_of_speech-taxonomy-entire.tsv536 MB
- GF2.0-words-all-morphosyntactic_tags-split_MSD-taxonomy-entire.tsv283 kB
- GF2.0-words-all-lemmas-parts_of_speech-taxonomy-short.tsv23 MB
- GF2.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-entire.tsv832 MB
- GF2.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-short.tsv26 MB
- Name
- GF2.0-words-nouns.zip
- Size
- 87.48 MB
- Format
- application/zip
- Description
- Frequency lists of nouns in Gigafida 2.0
- MD5
- 096568189982fae1ca53081258b5e810
- Name
- GF2.0-words-verbs.zip
- Size
- 9.2 MB
- Format
- application/zip
- Description
- Frequency lists of verbs in Gigafida 2.0
- MD5
- 628b802ef22b12b1ddc2a1268f76f610
- Name
- GF2.0-words-adjectives.zip
- Size
- 36.92 MB
- Format
- application/zip
- Description
- Frequency lists of adjectives in Gigafida 2.0
- MD5
- ef2276073cfb65799430bab546ffc016
- Name
- GF2.0-words-adverbs.zip
- Size
- 1.55 MB
- Format
- application/zip
- Description
- Frequency lists of adverbs in Gigafida 2.0
- MD5
- 2d993431f604f6ab48a81f0bf9393166
- Name
- GF2.0-words-pronouns.zip
- Size
- 184.68 KB
- Format
- application/zip
- Description
- Frequency lists of pronouns in Gigafida 2.0
- MD5
- 77d6cbcf2fdf4b7a2c71b0b05074bfbe
- Name
- GF2.0-words-numerals.zip
- Size
- 9.63 MB
- Format
- application/zip
- Description
- Frequency lists of numerals in Gigafida 2.0
- MD5
- f09371c656d3c2b017a7a8f698347ced
- Name
- GF2.0-words-conjunctions.zip
- Size
- 31.89 KB
- Format
- application/zip
- Description
- Frequency lists of conjunctions in Gigafida 2.0
- MD5
- ce38989d7debf1cb3594b89acd6a20ef
- Name
- GF2.0-words-prepositions.zip
- Size
- 47.35 KB
- Format
- application/zip
- Description
- Frequency lists of prepositions in Gigafida 2.0
- MD5
- de408eb097b87ca11cf6e3875397cdf6
- Name
- GF2.0-words-particles.zip
- Size
- 19.02 KB
- Format
- application/zip
- Description
- Frequency lists of particles in Gigafida 2.0
- MD5
- 69efee495de91654d7b8709fe4aa021f
- Name
- GF2.0-words-interjections.zip
- Size
- 72.36 KB
- Format
- application/zip
- Description
- Frequency lists of interjections in Gigafida 2.0
- MD5
- 333f5680daab84a6c3b96175747ec3d6
- Name
- GF2.0-words-abbreviations.zip
- Size
- 142.91 KB
- Format
- application/zip
- Description
- Frequency lists of abbreviations in Gigafida 2.0
- MD5
- 7ccc14b0cc0741e91e892a7d770c330f
- Name
- GF2.0-words-residual.zip
- Size
- 14.65 MB
- Format
- application/zip
- Description
- Frequency lists of residual words in Gigafida 2.0
- MD5
- f8393ecbbc7337b938e41cddb992f713