What's New
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-12 covers the period from January 2019 to December 2024, complementing the ...
Ta vnos ne vsebuje datotek.
corpus
Description:
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such as part-of-speech taggers, parsers, prosodic ...
Ta vnos vsebuje 2 datotek(e) (1.33
GB).
Publicly Available
lexicalConceptualResource
Description:
The Western South Slavic verbal database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCMS verbs which are all coded for a number of properties spanning from their phonology, morphology to their ...
Ta vnos vsebuje 3 datotek(e) (11.43
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
Ta vnos vsebuje 3 datotek(e) (1.89
GB).
Publicly Available
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available