Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.date.accessioned 2019-11-13T08:54:49Z
dc.date.available 2019-11-13T08:54:49Z
dc.date.issued 2019-11-18
dc.identifier.uri http://hdl.handle.net/11356/1272
dc.description Frequency lists of character-level n-grams were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. Character-level n-grams were extracted from lemmas (5 files) and lower-case word forms (5 files).
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://slovnica.ijs.si/
dc.subject characters
dc.subject n-grams
dc.subject standard language
dc.subject frequency list
dc.title Frequency lists of character-level n-grams from the Gigafida 2.0 corpus
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds
files.count 2
files.size 77975117


 Files in this item

 Download all files in item (74.36 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GF2.0-characters-lemmas.zip
Size
45.65 MB
Format
application/zip
Description
Frequency lists of character-level n-grams from lemmas in Gigafida 2.0
MD5
936786f83df71c161b140b6d9c222f4a
 Download file  Preview
 File Preview  
    • GF2.0-characters-lemmas-3grams-taxonomy-entire.tsv22 MB
    • GF2.0-characters-lemmas-4grams-taxonomy-short.tsv22 MB
    • GF2.0-characters-lemmas-5grams-taxonomy-entire.tsv295 MB
    • GF2.0-characters-lemmas-3grams-taxonomy-short.tsv19 MB
    • GF2.0-characters-lemmas-2grams-taxonomy-entire.tsv1 MB
    • GF2.0-characters-lemmas-4grams-taxonomy-entire.tsv106 MB
    • GF2.0-characters-lemmas-1grams-taxonomy-entire.tsv81 kB
    • GF2.0-characters-lemmas-5grams-taxonomy-short.tsv23 MB
Icon
Name
GF2.0-characters-lowercase_forms.zip
Size
28.72 MB
Format
application/zip
Description
Frequency lists of character-level n-grams from lower-case word forms in Gigafida 2.0
MD5
501a235e38f42e0c75e3b5bc917bda2c
 Download file  Preview
 File Preview  
    • GF2.0-characters-lowercase_forms-5grams-taxonomy-short.tsv22 MB
    • GF2.0-characters-lowercase_forms-2grams-taxonomy-entire.tsv978 kB
    • GF2.0-characters-lowercase_forms-4grams-taxonomy-entire.tsv67 MB
    • GF2.0-characters-lowercase_forms-4grams-taxonomy-short.tsv21 MB
    • GF2.0-characters-lowercase_forms-1grams-taxonomy-entire.tsv57 kB
    • GF2.0-characters-lowercase_forms-3grams-taxonomy-entire.tsv11 MB
    • GF2.0-characters-lowercase_forms-5grams-taxonomy-entire.tsv220 MB

Show simple item record