Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2020-11-02T12:40:06Z
dc.date.available 2020-11-02T12:40:06Z
dc.date.issued 2020-10-28
dc.identifier.uri http://hdl.handle.net/11356/1367
dc.description The lists contain consonant-vowel structures of all lemmas, word forms, and standardized word forms in the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040). In each unit, its characters were converted as follows: C - consonant (in lists with finegrained character categorizations, consonants were divided into Z - sonorant, G - voiced obstruent, and K - voiceless obstruent), V - vowel, X - foreign consonant, Y - foreign vowel, S - symbol, P - punctuation, N - number, F - non-Latin-script character, ! - other. Each consonant-vowel structure also contains its frequency in the corpus (i.e. the total sum of the frequencies of all units corresponding to the consonant-vowel structure), as well as the set of all units (in the lists labeled "entire") or the set of its 30 most frequent units (in the lists labeled as "short"), along with their part-of-speech categories and their individual frequencies). They also contain the number of all unique units within the consonant-vowel structure. The lists were prepared based on frequency lists extracted from GOS 1.0 using LIST: http://hdl.handle.net/11356/1276 Note that there exists a related resource, "Consonant-vowel structures in the Gigafida 2.0 corpus", http://hdl.handle.net/11356/1289 Compared to the previous version (http://hdl.handle.net/11356/1290), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.replaces http://hdl.handle.net/11356/1290
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject consonants
dc.subject vowels
dc.subject consonant-vowel structures
dc.subject GOS
dc.subject spoken Slovene
dc.subject sonorants
dc.subject obstruents
dc.subject frequency list
dc.title Consonant-vowel structures in the GOS 1.0 corpus 1.1
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
files.count 7
files.size 3773628


 Files in this item

 Download all files in item (3.6 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GOS1.0_cv_forms_robust.zip
Size
546.99 KB
Format
application/zip
Description
Consonant-vowel structures of word forms in GOS 1.0 (robust consonant categorization)
MD5
eb7ed86ef861c2eeae526b12efe0544b
 Download file  Preview
 File Preview  
    • GOS1.0_cv_forms_robust_entire.tsv1 MB
    • GOS1.0_cv_forms_robust_short.tsv587 kB
Icon
Name
GOS1.0_cv_forms_finegrained.zip
Size
963.23 KB
Format
application/zip
Description
Consonant-vowel structures of word forms in GOS 1.0 (finegrained consonant categorization)
MD5
3e2c7c5e35e1e3ae9aa1b63409a1c890
 Download file  Preview
 File Preview  
    • GOS1.0_cv_forms_finegrained_short.tsv2 MB
    • GOS1.0_cv_forms_finegrained_entire.tsv2 MB
Icon
Name
GOS1.0_cv_lemmas_robust.zip
Size
285.35 KB
Format
application/zip
Description
Consonant-vowel structures of lemmas in GOS 1.0 (robust consonant categorization)
MD5
38d13830dcf856e2a31d9965bbf26027
 Download file  Preview
 File Preview  
    • GOS1.0_cv_lemmas_robust_short.tsv392 kB
    • GOS1.0_cv_lemmas_robust_entire.tsv566 kB
Icon
Name
GOS1.0_cv_lemmas_finegrained.zip
Size
485.04 KB
Format
application/zip
Description
Consonant-vowel structures of lemmas in GOS 1.0 (finegrained consonant categorization)
MD5
3e7665971a410db79c000e85a7def591
 Download file  Preview
 File Preview  
    • GOS1.0_cv_lemmas_finegrained_short.tsv1 MB
    • GOS1.0_cv_lemmas_finegrained_entire.tsv1 MB
Icon
Name
GOS1.0_cv_stand_robust.zip
Size
514.04 KB
Format
application/zip
Description
Consonant-vowel structures of standardized word forms in GOS 1.0 (robust consonant categorization)
MD5
7f5aa33462681a32d2cf33d650746c9a
 Download file  Preview
 File Preview  
    • GOS1.0_cv_stand_robust_short.tsv609 kB
    • GOS1.0_cv_stand_robust_entire.tsv1 MB
Icon
Name
GOS1.0_cv_stand_finegrained.zip
Size
887.47 KB
Format
application/zip
Description
Consonant-vowel structures of standardized word forms in GOS 1.0 (finegrained consonant categorization)
MD5
b142c0c1f34944874733b544dec5c2b4
 Download file  Preview
 File Preview  
    • GOS1.0_cv_stand_finegrained_entire.tsv2 MB
    • GOS1.0_cv_stand_finegrained_short.tsv2 MB
Icon
Name
GOS1.0_character_categorization.tsv
Size
3.06 KB
Format
Unknown
Description
Categorization of characters in GOS 1.0
MD5
c944d2c29567bd486f72a4ee17a45ffd
 Download file

Show simple item record