Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2020-02-14T18:36:16Z
dc.date.available 2020-02-14T18:36:16Z
dc.date.issued 2020-02-13
dc.identifier.uri http://hdl.handle.net/11356/1289
dc.description The lists contain consonant-vowel structures of all lemmas and word forms in the Gigafida 2.0 corpus. In each unit, its characters were converted as follows: C - consonant (in lists with finegrained character categorizations, consonants were divided into Z - sonorant, G - voiced obstruent, and K - voiceless obstruent), V - vowel, X - foreign consonant, Y - foreign vowel, S - symbol, P - punctuation, N - number, F - non-Latin-script character, ! - other. Each consonant-vowel structure also contains its frequency in the corpus (i.e. the total sum of the frequencies of all units corresponding to the consonant-vowel structure), as well as the set of all units (in the lists labeled "entire") or the set of its 30 most frequent units (in the lists labeled as "short"), along with their part-of-speech categories and their individual frequencies). They also contain the number of all unique units within the consonant-vowel structure. The lists were prepared based on frequency lists extracted from Gigafida 2.0 using LIST: http://hdl.handle.net/11356/1276 Note that there exists a related resource, "Consonant-vowel structures in the GOS 1.0 corpus", http://hdl.handle.net/11356/1290.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject consonants
dc.subject vowels
dc.subject consonant-vowel structures
dc.subject frequency list
dc.subject Gigafida
dc.subject sonorants
dc.subject obstruents
dc.title Consonant-vowel structures in the Gigafida 2.0 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
files.count 5
files.size 148630712


 Files in this item

 Download all files in item (141.75 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GF2.0_cv_lemmas_robust.zip
Size
23.74 MB
Format
application/zip
Description
Consonant-vowel structures of lemmas in Gigafida 2.0 (robust consonant categorization)
MD5
152ea72c125f5e9d3ade83adc1298d48
 Download file  Preview
 File Preview  
    • GF2.0_cv_lemmas_robust_entire.tsv63 MB
    • GF2.0_cv_lemmas_robust_short.tsv24 MB
Icon
Name
GF2.0_cv_lemmas_finegrained.zip
Size
37.82 MB
Format
application/zip
Description
Consonant-vowel structures of lemmas in Gigafida 2.0 (finegrained consonant categorization)
MD5
638e59479881a3ee1486cc9fb4eeb820
 Download file  Preview
 File Preview  
    • GF2.0_cv_lemmas_finegrained_short.tsv82 MB
    • GF2.0_cv_lemmas_finegrained_entire.tsv102 MB
Icon
Name
GF2.0_cv_forms_robust.zip
Size
30.45 MB
Format
application/zip
Description
Consonant-vowel structures of word forms in Gigafida 2.0 (robust consonant categorization)
MD5
d13c5b7a04821c18781d3fc616b673b0
 Download file  Preview
 File Preview  
    • GF2.0_cv_forms_robust_short.tsv29 MB
    • GF2.0_cv_forms_robust_entire.tsv90 MB
Icon
Name
GF2.0_cv_forms_finegrained.zip
Size
49.74 MB
Format
application/zip
Description
Consonant-vowel structures of word forms in Gigafida 2.0 (finegrained consonant categorization)
MD5
1e5fd122c322cb82eb3afcf6a14bfd38
 Download file  Preview
 File Preview  
    • GF2.0_cv_forms_finegrained_short.tsv105 MB
    • GF2.0_cv_forms_finegrained_entire.tsv136 MB
Icon
Name
GF2.0_character_categorization.tsv
Size
3.06 KB
Format
Unknown
Description
Categorization of characters in Gigafida 2.0
MD5
c944d2c29567bd486f72a4ee17a45ffd
 Download file

Show simple item record