What's New
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-08 covers the period from January 2019 to August 2025, complementing the Gigafida ...
This item contains no files.
corpus

Description:
GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of units of prompts, instrutions and responses from the ...
This item contains 1 file (41.6
MB).
Publicly Available


corpus

Description:
The ParlaMint-IL corpus is the Israeli contribution to the ParlaMint collection of comparable parliamentary corpora (https://www.clarin.eu/parlamint), which contain transcriptions of parliamentary debates of European ...
This item contains 2 files (33.81
GB).
Publicly Available



Most Viewed Items
Top Last Week
corpus

Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
This item contains 3 files (1.89
GB).
Publicly Available




corpus

Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available


corpus

Description:
The ParlaMint-IL corpus is the Israeli contribution to the ParlaMint collection of comparable parliamentary corpora (https://www.clarin.eu/parlamint), which contain transcriptions of parliamentary debates of European ...
This item contains 2 files (33.81
GB).
Publicly Available


