Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2019-11-13T09:03:17Z
dc.date.available 2019-11-13T09:03:17Z
dc.date.issued 2019-11-18
dc.identifier.uri http://hdl.handle.net/11356/1273
dc.description Frequency lists of words were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their lemmas and morphosyntactic tags along with their text-type distribution. In addition, three lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject words
dc.subject lemmas
dc.subject morphosyntactic tags
dc.title Frequency lists of words from the Gigafida 2.0 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
files.count 13
files.size 283556149


 Files in this item

 Download all files in item (270.42 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GF2.0-words-all.zip
Size
110.5 MB
Format
application/zip
Description
Frequency lists of all words in Gigafida 2.0
MD5
b20a959f9c113aeb6504f0d753d36d10
 Download file  Preview
 File Preview  
    • GF2.0-words-all-lemmas-parts_of_speech-taxonomy-entire.tsv536 MB
    • GF2.0-words-all-morphosyntactic_tags-split_MSD-taxonomy-entire.tsv283 kB
    • GF2.0-words-all-lemmas-parts_of_speech-taxonomy-short.tsv23 MB
    • GF2.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-entire.tsv832 MB
    • GF2.0-words-all-lowercase_forms-lemmas-parts_of_speech-taxonomy-short.tsv26 MB
Icon
Name
GF2.0-words-nouns.zip
Size
87.48 MB
Format
application/zip
Description
Frequency lists of nouns in Gigafida 2.0
MD5
096568189982fae1ca53081258b5e810
 Download file  Preview
 File Preview  
    • GF2.0-words-nouns-lemmas-taxonomy-entire.tsv313 MB
    • GF2.0-words-nouns-lowercase_forms-taxonomy-entire.tsv720 MB
    • GF2.0-words-nouns-lemmas-taxonomy-short.tsv22 MB
    • GF2.0-words-nouns-lowercase_forms-taxonomy-short.tsv29 MB
Icon
Name
GF2.0-words-verbs.zip
Size
9.2 MB
Format
application/zip
Description
Frequency lists of verbs in Gigafida 2.0
MD5
628b802ef22b12b1ddc2a1268f76f610
 Download file  Preview
 File Preview  
    • GF2.0-words-verbs-lowercase_forms-taxonomy-entire.tsv48 MB
    • GF2.0-words-verbs-lowercase_forms-taxonomy-short.tsv28 MB
    • GF2.0-words-verbs-lemmas-taxonomy-entire.tsv12 MB
Icon
Name
GF2.0-words-adjectives.zip
Size
36.92 MB
Format
application/zip
Description
Frequency lists of adjectives in Gigafida 2.0
MD5
ef2276073cfb65799430bab546ffc016
 Download file  Preview
 File Preview  
    • GF2.0-words-adjectives-lemmas-taxonomy-entire.tsv59 MB
    • GF2.0-words-adjectives-lowercase_forms-taxonomy-entire.tsv303 MB
    • GF2.0-words-adjectives-lemmas-taxonomy-short.tsv21 MB
    • GF2.0-words-adjectives-lowercase_forms-taxonomy-short.tsv30 MB
Icon
Name
GF2.0-words-adverbs.zip
Size
1.55 MB
Format
application/zip
Description
Frequency lists of adverbs in Gigafida 2.0
MD5
2d993431f604f6ab48a81f0bf9393166
 Download file  Preview
 File Preview  
    • GF2.0-words-adverbs-lowercase_forms-taxonomy-entire.tsv11 MB
    • GF2.0-words-adverbs-lemmas-taxonomy-entire.tsv9 MB
Icon
Name
GF2.0-words-pronouns.zip
Size
184.68 KB
Format
application/zip
Description
Frequency lists of pronouns in Gigafida 2.0
MD5
77d6cbcf2fdf4b7a2c71b0b05074bfbe
 Download file  Preview
 File Preview  
    • GF2.0-words-pronouns-lemmas-taxonomy-entire.tsv423 kB
    • GF2.0-words-pronouns-lowercase_forms-taxonomy-entire.tsv1 MB
Icon
Name
GF2.0-words-numerals.zip
Size
9.63 MB
Format
application/zip
Description
Frequency lists of numerals in Gigafida 2.0
MD5
f09371c656d3c2b017a7a8f698347ced
 Download file  Preview
 File Preview  
    • GF2.0-words-numerals-lowercase_forms-taxonomy-entire.tsv69 MB
    • GF2.0-words-numerals-lemmas-taxonomy-short.tsv19 MB
    • GF2.0-words-numerals-lowercase_forms-taxonomy-short.tsv23 MB
    • GF2.0-words-numerals-lemmas-taxonomy-entire.tsv56 MB
Icon
Name
GF2.0-words-conjunctions.zip
Size
31.89 KB
Format
application/zip
Description
Frequency lists of conjunctions in Gigafida 2.0
MD5
ce38989d7debf1cb3594b89acd6a20ef
 Download file  Preview
 File Preview  
    • GF2.0-words-conjunctions-lemmas-taxonomy-entire.tsv139 kB
    • GF2.0-words-conjunctions-lowercase_forms-taxonomy-entire.tsv167 kB
Icon
Name
GF2.0-words-prepositions.zip
Size
47.35 KB
Format
application/zip
Description
Frequency lists of prepositions in Gigafida 2.0
MD5
de408eb097b87ca11cf6e3875397cdf6
 Download file  Preview
 File Preview  
    • GF2.0-words-prepositions-lemmas-taxonomy-entire.tsv206 kB
    • GF2.0-words-prepositions-lowercase_forms-taxonomy-entire.tsv255 kB
Icon
Name
GF2.0-words-particles.zip
Size
19.02 KB
Format
application/zip
Description
Frequency lists of particles in Gigafida 2.0
MD5
69efee495de91654d7b8709fe4aa021f
 Download file  Preview
 File Preview  
    • GF2.0-words-particles-lowercase_forms-taxonomy-entire.tsv59 kB
    • GF2.0-words-particles-lemmas-taxonomy-entire.tsv53 kB
Icon
Name
GF2.0-words-interjections.zip
Size
72.36 KB
Format
application/zip
Description
Frequency lists of interjections in Gigafida 2.0
MD5
333f5680daab84a6c3b96175747ec3d6
 Download file  Preview
 File Preview  
    • GF2.0-words-interjections-lemmas-taxonomy-entire.tsv467 kB
    • GF2.0-words-interjections-lowercase_forms-taxonomy-entire.tsv526 kB
Icon
Name
GF2.0-words-abbreviations.zip
Size
142.91 KB
Format
application/zip
Description
Frequency lists of abbreviations in Gigafida 2.0
MD5
7ccc14b0cc0741e91e892a7d770c330f
 Download file  Preview
 File Preview  
    • GF2.0-words-abbreviations-lemmas-taxonomy-entire.tsv843 kB
    • GF2.0-words-abbreviations-lowercase_forms-taxonomy-entire.tsv952 kB
Icon
Name
GF2.0-words-residual.zip
Size
14.65 MB
Format
application/zip
Description
Frequency lists of residual words in Gigafida 2.0
MD5
f8393ecbbc7337b938e41cddb992f713
 Download file  Preview
 File Preview  
    • GF2.0-words-residual-lowercase_forms-taxonomy-entire.tsv88 MB
    • GF2.0-words-residual-lowercase_forms-taxonomy-skrajsan.tsv23 MB
    • GF2.0-words-residual-lemmas-taxonomy-entire.tsv70 MB
    • GF2.0-words-residual-lemmas-taxonomy-short.tsv20 MB

Show simple item record