dc.contributor.author | Pollak, Senja |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Krek, Simon |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.date.accessioned | 2020-09-24T15:32:42Z |
dc.date.available | 2020-09-24T15:32:42Z |
dc.date.issued | 2020-09-10 |
dc.identifier.uri | http://hdl.handle.net/11356/1346 |
dc.description | The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text corpora: the balanced reference corpus of written Slovene Kres, the reference corpus of spoken Slovene GOS, the corpus of computer-mediated communication Janes and the corpus of school written production Šolar 2.0. The list was additionally manually cleaned and contains 4,768 common general lemmas. The file is in a tab separated format, containing lemma, part-of-speech (following the MULTEXT-East tagset for Slovene), relative average reduced frequency in each of the corpora, and the final average score computed from these values. The dataset is described in more detail in: Špela Arhar Holdt, Senja Pollak, Marko Robnik Šikonja, Simon Krek (2020). Referenčni seznam pogostih splošnih besed za slovenščino. In the Proceedings of the Conference on Language Technologies and Digital Humanities, pp. 10-15. |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/825153 |
dc.relation.isreferencedby | http://nl.ijs.si/jtdh20/pdf/JT-DH_2020_Arhar-Holdt-et-al_Referencni-seznam-pogostih-splosnih-besed-za-slovenscino.pdf |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://kauc.splet.arnes.si/ |
dc.subject | common words |
dc.subject | frequent words |
dc.subject | reference corpora |
dc.subject | readability |
dc.title | Reference List of Slovene Frequent Common Words |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Špela Arhar Holdt arhar.spela@gmail.com Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
sponsor | European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153 |
sponsor | Ministry of Education, Science and Sport 3330-17-1748 KAUČ - Improving the Quality of Slovene Textbooks/Za kakovost slovenskih učbenikov nationalFunds |
size.info | 4768 entries |
files.count | 1 |
files.size | 71183 |
Datoteke v tem vnosu
To je vnos
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
z licenco:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Ime
- SloveneFrequentCommonWords.zip
- Velikost
- 69.51 KB
- Format
- application/zip
- Opis
- Slovene frequent common words in TSV format
- MD5
- 4936baa23df5ed2a9f9a8a0c924b25fd