dc.contributor.author | Čibej, Jaka |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Dobrovoljc, Kaja |
dc.contributor.author | Krek, Simon |
dc.date.accessioned | 2019-11-13T08:54:49Z |
dc.date.available | 2019-11-13T08:54:49Z |
dc.date.issued | 2019-11-18 |
dc.identifier.uri | http://hdl.handle.net/11356/1272 |
dc.description | Frequency lists of character-level n-grams were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. Character-level n-grams were extracted from lemmas (5 files) and lower-case word forms (5 files). |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://slovnica.ijs.si/ |
dc.subject | characters |
dc.subject | n-grams |
dc.subject | standard language |
dc.subject | frequency list |
dc.title | Frequency lists of character-level n-grams from the Gigafida 2.0 corpus |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-8256 New grammar of contemporary standard Slovene: sources and methods nationalFunds |
files.count | 2 |
files.size | 77975117 |
Files in this item
Download all files in item (74.36 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- GF2.0-characters-lemmas.zip
- Size
- 45.65 MB
- Format
- application/zip
- Description
- Frequency lists of character-level n-grams from lemmas in Gigafida 2.0
- MD5
- 936786f83df71c161b140b6d9c222f4a
- GF2.0-characters-lemmas-3grams-taxonomy-entire.tsv22 MB
- GF2.0-characters-lemmas-4grams-taxonomy-short.tsv22 MB
- GF2.0-characters-lemmas-5grams-taxonomy-entire.tsv295 MB
- GF2.0-characters-lemmas-3grams-taxonomy-short.tsv19 MB
- GF2.0-characters-lemmas-2grams-taxonomy-entire.tsv1 MB
- GF2.0-characters-lemmas-4grams-taxonomy-entire.tsv106 MB
- GF2.0-characters-lemmas-1grams-taxonomy-entire.tsv81 kB
- GF2.0-characters-lemmas-5grams-taxonomy-short.tsv23 MB

- Name
- GF2.0-characters-lowercase_forms.zip
- Size
- 28.72 MB
- Format
- application/zip
- Description
- Frequency lists of character-level n-grams from lower-case word forms in Gigafida 2.0
- MD5
- 501a235e38f42e0c75e3b5bc917bda2c
- GF2.0-characters-lowercase_forms-5grams-taxonomy-short.tsv22 MB
- GF2.0-characters-lowercase_forms-2grams-taxonomy-entire.tsv978 kB
- GF2.0-characters-lowercase_forms-4grams-taxonomy-entire.tsv67 MB
- GF2.0-characters-lowercase_forms-4grams-taxonomy-short.tsv21 MB
- GF2.0-characters-lowercase_forms-1grams-taxonomy-entire.tsv57 kB
- GF2.0-characters-lowercase_forms-3grams-taxonomy-entire.tsv11 MB
- GF2.0-characters-lowercase_forms-5grams-taxonomy-entire.tsv220 MB