What's New
corpus
Description:
The CVET corpus contains 230 texts (around 175 thousand words) of varying length, published in the religious journal "Cvetje z vertov sv. Frančiška" between 1887 and 1916, when the magazine was edited by the linguist Fr. ...
Ta vnos vsebuje 4 datotek(e) (15.02
MB).
Publicly Available
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 73 publishers. Trendi 2024-04 covers the period from January 2019 to April 2024, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
corpus
Description:
The DIALECT-COPA datasets comprise Choice of Plausible Alternatives (COPA) datasets for three South Slavic dialects: (1) COPA-SL-CER for the Cerkno dialect of Slovenian, spoken in the Slovenian Littoral region, specifically ...
Ta vnos vsebuje 6 datotek(e) (279.69
KB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.67
GB).
Publicly Available
corpus
Description:
This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022.
The SMS messages included in this corpus were obtained from voluntary donors (informants). Both ...
Ta vnos vsebuje 1 datoteko (1.69
MB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available