What's New
lexicalConceptualResource
Description:
The dataset contains 51,023 headword-synonym-distractor triplets for 5,000 headwords. Distractor is defined as an incorrect answer/alternative to synonym, which can be similar to synonym in meaning and/or form. Headwords ...
Ta vnos vsebuje 1 datoteko (829.85
KB).
Publicly Available
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-10 covers the period from January 2019 to October 2025, complementing the ...
Ta vnos ne vsebuje datotek.
corpus
Description:
The SI-IUS collection of older law texts is meant to be used both as a digital library and as a language corpus. For the former, each text has been carefully annotated in TEI preserving e.g. different types of divisions ...
Ta vnos vsebuje 3 datotek(e) (931.56
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
Ta vnos vsebuje 7 datotek(e) (4.64
GB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
lexicalConceptualResource
Description:
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens), and consists of 127 textbooks from 16 different ...
Ta vnos vsebuje 1 datoteko (864.93
KB).
Publicly Available