Prikaži enostavni zapis vnosa

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2020-02-14T18:36:16Z
dc.date.available 2020-02-14T18:36:16Z
dc.date.issued 2020-02-13
dc.identifier.uri http://hdl.handle.net/11356/1289
dc.description The lists contain consonant-vowel structures of all lemmas and word forms in the Gigafida 2.0 corpus. In each unit, its characters were converted as follows: C - consonant (in lists with finegrained character categorizations, consonants were divided into Z - sonorant, G - voiced obstruent, and K - voiceless obstruent), V - vowel, X - foreign consonant, Y - foreign vowel, S - symbol, P - punctuation, N - number, F - non-Latin-script character, ! - other. Each consonant-vowel structure also contains its frequency in the corpus (i.e. the total sum of the frequencies of all units corresponding to the consonant-vowel structure), as well as the set of all units (in the lists labeled "entire") or the set of its 30 most frequent units (in the lists labeled as "short"), along with their part-of-speech categories and their individual frequencies). They also contain the number of all unique units within the consonant-vowel structure. The lists were prepared based on frequency lists extracted from Gigafida 2.0 using LIST: http://hdl.handle.net/11356/1276 Note that there exists a related resource, "Consonant-vowel structures in the GOS 1.0 corpus", http://hdl.handle.net/11356/1290.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject consonants
dc.subject vowels
dc.subject consonant-vowel structures
dc.subject frequency list
dc.subject sonorants
dc.subject obstruents
dc.title Consonant-vowel structures in the Gigafida 2.0 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
files.count 5
files.size 148630712


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (141.75 MB)
Icon
Ime
GF2.0_cv_lemmas_robust.zip
Velikost
23.74 MB
Format
application/zip
Opis
Consonant-vowel structures of lemmas in Gigafida 2.0 (robust consonant categorization)
MD5
152ea72c125f5e9d3ade83adc1298d48
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • GF2.0_cv_lemmas_robust_entire.tsv63 MB
    • GF2.0_cv_lemmas_robust_short.tsv24 MB
Icon
Ime
GF2.0_cv_lemmas_finegrained.zip
Velikost
37.82 MB
Format
application/zip
Opis
Consonant-vowel structures of lemmas in Gigafida 2.0 (finegrained consonant categorization)
MD5
638e59479881a3ee1486cc9fb4eeb820
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • GF2.0_cv_lemmas_finegrained_short.tsv82 MB
    • GF2.0_cv_lemmas_finegrained_entire.tsv102 MB
Icon
Ime
GF2.0_cv_forms_robust.zip
Velikost
30.45 MB
Format
application/zip
Opis
Consonant-vowel structures of word forms in Gigafida 2.0 (robust consonant categorization)
MD5
d13c5b7a04821c18781d3fc616b673b0
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • GF2.0_cv_forms_robust_short.tsv29 MB
    • GF2.0_cv_forms_robust_entire.tsv90 MB
Icon
Ime
GF2.0_cv_forms_finegrained.zip
Velikost
49.74 MB
Format
application/zip
Opis
Consonant-vowel structures of word forms in Gigafida 2.0 (finegrained consonant categorization)
MD5
1e5fd122c322cb82eb3afcf6a14bfd38
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • GF2.0_cv_forms_finegrained_short.tsv105 MB
    • GF2.0_cv_forms_finegrained_entire.tsv136 MB
Icon
Ime
GF2.0_character_categorization.tsv
Velikost
3.06 KB
Format
Neznano
Opis
Categorization of characters in Gigafida 2.0
MD5
c944d2c29567bd486f72a4ee17a45ffd
 Prenesi datoteko

Prikaži enostavni zapis vnosa