What's New
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-09 covers the period from January 2019 to September 2025, complementing the ...
This item contains no files.
lexicalConceptualResource

Description:
DASSLE 1.0 (Dataset of Authentic and Synthetic Slovene Language Errors) comprises 7,385 manually prepared entries, each consisting of a Slovene sentence containing a single, specific language problem, its corrected version, ...
This item contains 1 file (370.11
KB).
Publicly Available




corpus

Description:
This entry contains the SLO-VLM-IT-Dataset, a comprehensive dataset designed for instruction-tuning vision-language models in the Slovenian language. It is composed of five main .json files, which together provide a rich ...
This item contains 1 file (462.68
MB).
Publicly Available



Most Viewed Items
Top Last Week
corpus

Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available


corpus

Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
This item contains 3 files (1.89
GB).
Publicly Available




corpus

Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (69.17
GB).
Publicly Available

