Prikaži enostavni zapis vnosa

 
dc.contributor.author Pollak, Senja
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Krek, Simon
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2020-09-24T15:32:42Z
dc.date.available 2020-09-24T15:32:42Z
dc.date.issued 2020-09-10
dc.identifier.uri http://hdl.handle.net/11356/1346
dc.description The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text corpora: the balanced reference corpus of written Slovene Kres, the reference corpus of spoken Slovene GOS, the corpus of computer-mediated communication Janes and the corpus of school written production Šolar 2.0. The list was additionally manually cleaned and contains 4,768 common general lemmas. The file is in a tab separated format, containing lemma, part-of-speech (following the MULTEXT-East tagset for Slovene), relative average reduced frequency in each of the corpora, and the final average score computed from these values. The dataset is described in more detail in: Špela Arhar Holdt, Senja Pollak, Marko Robnik Šikonja, Simon Krek (2020). Referenčni seznam pogostih splošnih besed za slovenščino. In the Proceedings of the Conference on Language Technologies and Digital Humanities, pp. 10-15.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby http://nl.ijs.si/jtdh20/pdf/JT-DH_2020_Arhar-Holdt-et-al_Referencni-seznam-pogostih-splosnih-besed-za-slovenscino.pdf
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://kauc.splet.arnes.si/
dc.subject common words
dc.subject frequent words
dc.subject reference corpora
dc.subject readability
dc.title Reference List of Slovene Frequent Common Words
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Špela Arhar Holdt arhar.spela@gmail.com Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
sponsor Ministry of Education, Science and Sport 3330-17-1748 KAUČ - Improving the Quality of Slovene Textbooks/Za kakovost slovenskih učbenikov nationalFunds
size.info 4768 entries
files.count 1
files.size 71183


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
SloveneFrequentCommonWords.zip
Velikost
69.51 KB
Format
application/zip
Opis
Slovene frequent common words in TSV format
MD5
4936baa23df5ed2a9f9a8a0c924b25fd
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • SloveneFrequentCommonWords.txt480 kB

Prikaži enostavni zapis vnosa