dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Krek, Simon |
dc.date.accessioned | 2020-02-14T18:36:16Z |
dc.date.available | 2020-02-14T18:36:16Z |
dc.date.issued | 2020-02-13 |
dc.identifier.uri | http://hdl.handle.net/11356/1289 |
dc.description | The lists contain consonant-vowel structures of all lemmas and word forms in the Gigafida 2.0 corpus. In each unit, its characters were converted as follows: C - consonant (in lists with finegrained character categorizations, consonants were divided into Z - sonorant, G - voiced obstruent, and K - voiceless obstruent), V - vowel, X - foreign consonant, Y - foreign vowel, S - symbol, P - punctuation, N - number, F - non-Latin-script character, ! - other. Each consonant-vowel structure also contains its frequency in the corpus (i.e. the total sum of the frequencies of all units corresponding to the consonant-vowel structure), as well as the set of all units (in the lists labeled "entire") or the set of its 30 most frequent units (in the lists labeled as "short"), along with their part-of-speech categories and their individual frequencies). They also contain the number of all unique units within the consonant-vowel structure. The lists were prepared based on frequency lists extracted from Gigafida 2.0 using LIST: http://hdl.handle.net/11356/1276 Note that there exists a related resource, "Consonant-vowel structures in the GOS 1.0 corpus", http://hdl.handle.net/11356/1290. |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | consonants |
dc.subject | vowels |
dc.subject | consonant-vowel structures |
dc.subject | frequency list |
dc.subject | sonorants |
dc.subject | obstruents |
dc.title | Consonant-vowel structures in the Gigafida 2.0 corpus |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
files.count | 5 |
files.size | 148630712 |
Files in this item
Download all files in item (141.75 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- GF2.0_cv_lemmas_robust.zip
- Size
- 23.74 MB
- Format
- application/zip
- Description
- Consonant-vowel structures of lemmas in Gigafida 2.0 (robust consonant categorization)
- MD5
- 152ea72c125f5e9d3ade83adc1298d48

- Name
- GF2.0_cv_lemmas_finegrained.zip
- Size
- 37.82 MB
- Format
- application/zip
- Description
- Consonant-vowel structures of lemmas in Gigafida 2.0 (finegrained consonant categorization)
- MD5
- 638e59479881a3ee1486cc9fb4eeb820

- Name
- GF2.0_cv_forms_robust.zip
- Size
- 30.45 MB
- Format
- application/zip
- Description
- Consonant-vowel structures of word forms in Gigafida 2.0 (robust consonant categorization)
- MD5
- d13c5b7a04821c18781d3fc616b673b0

- Name
- GF2.0_cv_forms_finegrained.zip
- Size
- 49.74 MB
- Format
- application/zip
- Description
- Consonant-vowel structures of word forms in Gigafida 2.0 (finegrained consonant categorization)
- MD5
- 1e5fd122c322cb82eb3afcf6a14bfd38

- Name
- GF2.0_character_categorization.tsv
- Size
- 3.06 KB
- Format
- Unknown
- Description
- Categorization of characters in Gigafida 2.0
- MD5
- c944d2c29567bd486f72a4ee17a45ffd