What's New
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-08 covers the period from January 2019 to September 2024, complementing the ...
This item contains no files.
corpus
Description:
GaMS-Instruct-DH is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional context ...
This item contains 1 file (888.96
KB).
Publicly Available
corpus
Description:
GaMS-Instruct-GEN is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional input ...
This item contains 1 file (3.12
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
The Montenegrin web corpus MaCoCu-cnr 1.0 was built by crawling the ".me" internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.c ...
This item contains 2 files (500.14
MB).
Publicly Available
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available
lexicalConceptualResource
Description:
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100,000 most frequent Slovenian lemmas, ...
This item contains 2 files (85.8
MB).
Publicly Available