What's New
corpus
Description:
SLawNLI is a human-annotated dataset for Natural Language Inference (NLI) in the Slovenian legal domain. It contains 2,214 examples constructed according to the standard NLI schema (premise, hypothesis, label). The dataset ...
Ta vnos vsebuje 1 datoteko (275.16
KB).
Publicly Available
corpus
Description:
The Spook corpus was compiled to enable corpus-based studies in translation and comprises 713 texts and about 375 thousand words. It is composed of three types of texts. The first comprises foreign language texts in French, ...
Ta vnos vsebuje 2 datotek(e) (59.54
MB).
Academic Use
lexicalConceptualResource
Description:
This dataset provides word-level multidimensional morphological annotations for Slovene, containing 1,935 entries manually annotated by two domain experts. The target words in the dataset were sampled from Sloleks 3.0 to ...
Ta vnos vsebuje 1 datoteko (95.62
KB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
Ta vnos vsebuje 2 datotek(e) (8.9
MB).
Publicly Available
lexicalConceptualResource
Description:
A lexicon of 751 emoji characters with automatically assigned sentiment.
The sentiment is computed from 70,000 tweets, labeled by 83 human annotators
in 13 European languages.
The process and analysis of emoji sentiment ...
Ta vnos vsebuje 3 datotek(e) (93.95
KB).
Publicly Available