Prikaži enostavni zapis vnosa

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2020-11-02T12:35:03Z
dc.date.available 2020-11-02T12:35:03Z
dc.date.issued 2020-10-28
dc.identifier.uri http://hdl.handle.net/11356/1363
dc.description Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. Character-level n-grams were extracted from lemmas (5 files), lower-case word forms (5 files), and standardized word forms (5 files). Compared to the previous version (http://hdl.handle.net/11356/1268), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.replaces http://hdl.handle.net/11356/1268
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject spoken corpus
dc.subject frequency list
dc.subject n-grams
dc.subject characters
dc.title Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
size.info 15 files
files.count 1
files.size 2686389


 Datoteke v tem vnosu

Icon
Ime
GOS1.0-characters.zip
Velikost
2.56 MB
Format
application/zip
Opis
Frequency lists of character-level n-grams from GOS1.0
MD5
37c3c093d4c8582eb6505c5ce06ab3b8
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • GOS1.0-characters-lemmas
    • GOS1.0-characters-lemmas-5grams-taxonomy-entire.tsv4 MB
    • GOS1.0-characters-lemmas-2grams-taxonomy-entire.tsv145 kB
    • GOS1.0-characters-lemmas-4grams-taxonomy-entire.tsv2 MB
    • GOS1.0-characters-lemmas-1grams-taxonomy-entire.tsv12 kB
    • GOS1.0-characters-lemmas-3grams-taxonomy-entire.tsv890 kB
  • GOS1.0-characters-lowercase_forms
    • GOS1.0-characters-lowercase_forms-2grams-taxonomy-entire.tsv77 kB
    • GOS1.0-characters-lowercase_forms-4grams-taxonomy-entire.tsv3 MB
    • GOS1.0-characters-lowercase_forms-1grams-taxonomy-entire.tsv7 kB
    • GOS1.0-characters-lowercase_forms-3grams-taxonomy-entire.tsv753 kB
    • GOS1.0-characters-lowercase_forms-5grams-taxonomy-entire.tsv6 MB
  • GOS1.0-characters-standardized_forms
    • GOS1.0-characters-standardized_forms-2grams-taxonomy-entire.tsv141 kB
    • GOS1.0-characters-standardized_forms-4grams-taxonomy-entire.tsv3 MB
    • GOS1.0-characters-standardized_forms-1grams-taxonomy-entire.tsv11 kB
    • GOS1.0-characters-standardized_forms-3grams-taxonomy-entire.tsv889 kB
    • GOS1.0-characters-standardized_forms-5grams-taxonomy-entire.tsv5 MB

Prikaži enostavni zapis vnosa