2026-06-22T00:38:03Zhttp://www.clarin.si/repository/oai/request

oai:www.clarin.si:11356/12752023-03-27T17:01:19Zhdl_11356_1023hdl_11356_1024

Frequency lists of word parts from the Gigafida 2.0 corpus Čibej, Jaka Arhar Holdt, Špela Dobrovoljc, Kaja Krek, Simon word parts morphology standard language frequency list initial part of the word final part of the word Frequency lists of words split into word parts were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas or lower-case word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest of the word. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 20 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms. In addition, 20 lists were extracted from all words (regardless of their part-of-speech category). For easier processing in statistical analysis software, shortened versions of longer lists were made containing the first 150,000 lines. 2019-11-18 lexicalConceptualResource http://hdl.handle.net/11356/1275 slv Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) https://creativecommons.org/licenses/by-sa/4.0/ PUB text/plain; charset=utf-8 application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip application/zip downloadable_files_count: 30 Centre for Language Resources and Technologies, University of Ljubljana Jožef Stefan Institute http://slovnica.ijs.si/