Reference List of Slovene Frequent Common Words

Reference List of Slovene Frequent Common Words

CLARIN.SI data & tools

Authors: Pollak, Senja ; Arhar Holdt, Špela ; Krek, Simon and Robnik-Šikonja, Marko

Item identifier: http://hdl.handle.net/11356/1346

Project URL: https://kauc.splet.arnes.si/

Referenced by: http://nl.ijs.si/jtdh20/pdf/JT-DH_2020_Arhar-Holdt-et-al_Referencni-seznam-pogostih-splosnih-besed-za-slovenscino.pdf

Date issued: 2020-09-10

Type: lexicalConceptualResource, text

Size: 4768 entries

Language(s): Slovenian

Description: The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text corpora: the balanced reference corpus of written Slovene Kres, the reference corpus of spoken Slovene GOS, the corpus of computer-mediated communication Janes and the corpus of school written production Šolar 2.0. The list was additionally manually cleaned and contains 4,768 common general lemmas. The file is in a tab separated format, containing lemma, part-of-speech (following the MULTEXT-East tagset for Slovene), relative average reduced frequency in each of the corpora, and the final average score computed from these values. The dataset is described in more detail in: Špela Arhar Holdt, Senja Pollak, Marko Robnik Šikonja, Simon Krek (2020). Referenčni seznam pogostih splošnih besed za slovenščino. In the Proceedings of the Conference on Language Technologies and Digital Humanities, pp. 10-15.

Publisher: Jožef Stefan Institute

Centre for Language Resources and Technologies, University of Ljubljana

Subject(s): common words frequent words reference corpora readability

Collection(s): CLARIN.SI data & tools

Show full item record

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Name: SloveneFrequentCommonWords.zip
Size: 69.51 KB
Format: application/zip
Description: Slovene frequent common words in TSV format
MD5: 4936baa23df5ed2a9f9a8a0c924b25fd

Download file Preview

File Preview

- SloveneFrequentCommonWords.txt480 kB