What's New
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-12 covers the period from January 2019 to December 2025, complementing the ...
This item contains no files.
lexicalConceptualResource
Description:
This digital dictionary of papermaking was made on the basis of the printed edition, i.e.
Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU (https://doi.org/10.3986/961618220X).
It is an explanatory, ...
This item contains 3 files (3.84
MB).
Publicly Available
lexicalConceptualResource
Description:
The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to collocation meaning and/or form. Headwords and ...
This item contains 1 file (1.46
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
The Map task corpus of heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) consists of elicited conversations (map tasks) by 29 second-generation BCMS speakers originating from different regions of former Yugoslavia and ...
This item contains 2 files (751.91
KB).
Publicly Available
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
This item contains 7 files (4.64
GB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (2.17
GB).
Publicly Available