Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2019-11-13T08:51:25Z
dc.date.available 2019-11-13T08:51:25Z
dc.date.issued 2019-11-18
dc.identifier.uri http://hdl.handle.net/11356/1268
dc.description Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. Character-level n-grams were extracted from lemmas (5 files), lower-case word forms (5 files), and normalized word forms (5 files).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.isreplacedby http://hdl.handle.net/11356/1363
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject spoken corpus
dc.subject frequency list
dc.subject n-grams
dc.subject characters
dc.title Frequency lists of character-level n-grams from the GOS 1.0 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
size.info 15 files
files.count 1
files.size 2686486


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GOS1.0-characters.zip
Size
2.56 MB
Format
application/zip
Description
Frequency lists of character-level n-grams from GOS1.0
MD5
634e21e32209f0b0af5d27a0d50a5f62
 Download file  Preview
 File Preview  
  • GOS1.0-characters-normalized_forms
    • GOS1.0-characters-normalized_forms-1grams-taxonomy-entire.tsv11 kB
    • GOS1.0-characters-normalized_forms-3grams-taxonomy-entire.tsv889 kB
    • GOS1.0-characters-normalized_forms-5grams-taxonomy-entire.tsv5 MB
    • GOS1.0-characters-normalized_forms-2grams-taxonomy-entire.tsv141 kB
    • GOS1.0-characters-normalized_forms-4grams-taxonomy-entire.tsv3 MB
  • GOS1.0-characters-lemmas
    • GOS1.0-characters-lemmas-5grams-taxonomy-entire.tsv4 MB
    • GOS1.0-characters-lemmas-2grams-taxonomy-entire.tsv145 kB
    • GOS1.0-characters-lemmas-4grams-taxonomy-entire.tsv2 MB
    • GOS1.0-characters-lemmas-1grams-taxonomy-entire.tsv12 kB
    • GOS1.0-characters-lemmas-3grams-taxonomy-entire.tsv890 kB
  • GOS1.0-characters-lowercase_forms
    • GOS1.0-characters-lowercase_forms-2grams-taxonomy-entire.tsv77 kB
    • GOS1.0-characters-lowercase_forms-4grams-taxonomy-entire.tsv3 MB
    • GOS1.0-characters-lowercase_forms-1grams-taxonomy-entire.tsv7 kB
    • GOS1.0-characters-lowercase_forms-3grams-taxonomy-entire.tsv753 kB
    • GOS1.0-characters-lowercase_forms-5grams-taxonomy-entire.tsv6 MB

Show simple item record