What's New
corpus
Description:
Gigafida 2.2 is a reference corpus of written Slovene texts published in the period 1990-2018. It is comprised of daily news, magazines, a selection of web texts (a certain portion of which covers news texts as well), and ...
Ta vnos ne vsebuje datotek.
corpus
Description:
Gigafida 2.1 is a reference corpus of written Slovene texts published in the period 1990-2018. It is comprised of daily news, magazines, a selection of web texts (a certain portion of which covers news texts as well), and ...
Ta vnos ne vsebuje datotek.
corpus
Description:
SLawNLI is a human-annotated dataset for Natural Language Inference (NLI) in the Slovenian legal domain. It contains 2,214 examples constructed according to the standard NLI schema (premise, hypothesis, label). The dataset ...
Ta vnos vsebuje 1 datoteko (275.16
KB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
Ta vnos vsebuje 2 datotek(e) (8.9
MB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available