What's New
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-11 covers the period from January 2019 to November 2025, complementing the ...
This item contains no files.
corpus
Description:
The KROHOT corpus consists of 10 audio recordings of private, spontaneous conversations between two or three speakers, with a total duration of 232 minutes. Most recordings were made between May and September 2025. The ...
This item contains 2 files (878.84
MB).
Publicly Available
corpus
Description:
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 10,590 texts (almost 1.4 million words) written by adult speakers for whom Slovene is not their first language. This corpus ...
This item contains 2 files (167.69
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
This item contains 7 files (4.64
GB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available
lexicalConceptualResource
Description:
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens), and consists of 127 textbooks from 16 different ...
This item contains 1 file (864.93
KB).
Publicly Available