What's New

 corpus 
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-08 covers the period from January 2019 to August 2025, complementing the Gigafida ...
 This item contains no files.
 corpus 
corpus
Description:
GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of units of prompts, instrutions and responses from the ...
 This item contains 1 file (41.6 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
The ParlaMint-IL corpus is the Israeli contribution to the ParlaMint collection of comparable parliamentary corpora (https://www.clarin.eu/parlamint), which contain transcriptions of parliamentary debates of European ...
 This item contains 2 files (33.81 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike

Most Viewed Items

Top Last Week
 toolService 
toolService
Description:
The X-GENRE classifier is a text classification model that can be used for automatic genre identification. The model classifies texts to one of 9 genre labels: Information/Explanation, News, Instruction, Opinion/Argumentation, ...
 This item contains 1 file (779.93 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
Trilingual parallel corpus on general data protection regulation. The size of the corpus is 54,468 words in English, 42,566 words in Lithuanian, and 47,740 words in Danish.
 This item contains no files.
 corpus 
corpus
Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
 This item contains 3 files (1.89 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial No Derivative Works