What's New
lexicalConceptualResource
Description:
The dataset contains 51,023 headword-synonym-distractor triplets for 5,000 headwords. Distractor is defined as an incorrect answer/alternative to synonym, which can be similar to synonym in meaning and/or form. Headwords ...
This item contains 1 file (829.85
KB).
Publicly Available
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-10 covers the period from January 2019 to October 2025, complementing the ...
This item contains no files.
corpus
Description:
The SI-IUS collection of older law texts is meant to be used both as a digital library and as a language corpus. For the former, each text has been carefully annotated in TEI preserving e.g. different types of divisions ...
This item contains 3 files (931.56
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
This item contains 7 files (4.64
GB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available
lexicalConceptualResource
Description:
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens), and consists of 127 textbooks from 16 different ...
This item contains 1 file (864.93
KB).
Publicly Available