2026-05-03T08:49:42Zhttp://www.clarin.si/repository/oai/request

oai:www.clarin.si:11356/12692023-03-27T17:01:19Zhdl_11356_1023hdl_11356_1024

Frequency lists of words from the GOS 1.0 corpus Čibej, Jaka Arhar Holdt, Špela Dobrovoljc, Kaja Krek, Simon frequency list spoken corpus words lemmas normalized forms Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their normalized forms, lemmas, and morphosyntactic tags along with their text-type distribution. In addition, four lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all lower-case word forms with their normalized word forms, lemmas, part-of-speech categories, and text-type distribution; 4) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns). 2019-11-18 lexicalConceptualResource http://hdl.handle.net/11356/1269 slv http://hdl.handle.net/11356/1364 Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) https://creativecommons.org/licenses/by-sa/4.0/ PUB text/plain; charset=utf-8 application/zip downloadable_files_count: 1 Centre for Language Resources and Technologies, University of Ljubljana Jožef Stefan Institute http://slovnica.ijs.si/