What's New
corpus
Description:
GaMS-Instruct-MED-Termset is an instruction-following dataset containing 975,060 prompt-response units in Slovene from the medical domain. It focuses on medical terms, with explanations for clinical and patient use and ...
Ta vnos vsebuje 1 datoteko (21.06
MB).
Publicly Available
corpus
Description:
GaMS-Instruct-MED-Anatomy is an instruction-following dataset containing 711,805 prompt-response units in Slovene (with English and Latin terminology). The units form a structured, machine-readable database of Slovenian ...
Ta vnos vsebuje 1 datoteko (29
MB).
Publicly Available
lexicalConceptualResource
Description:
MEZZANINE-NstdLex is a dataset containing 4,237 potentially non-standard vocabulary candidates from the Sloleks Morphological Lexicon of Slovene (collected from among the manually inspected entries of version 3.0; ...
Ta vnos vsebuje 1 datoteko (82.14
KB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
Ta vnos vsebuje 2 datotek(e) (8.9
MB).
Publicly Available