What's New
corpus
Description:
SLawNLI is a human-annotated dataset for Natural Language Inference (NLI) in the Slovenian legal domain. It contains 2,214 examples constructed according to the standard NLI schema (premise, hypothesis, label). The dataset ...
This item contains 1 file (275.16
KB).
Publicly Available
corpus
Description:
The Spook corpus was compiled to enable corpus-based studies in translation and comprises 713 texts and about 375 thousand words. It is composed of three types of texts. The first comprises foreign language texts in French, ...
This item contains 2 files (59.54
MB).
Academic Use
lexicalConceptualResource
Description:
This dataset provides word-level multidimensional morphological annotations for Slovene, containing 1,935 entries manually annotated by two domain experts. The target words in the dataset were sampled from Sloleks 3.0 to ...
This item contains 1 file (95.62
KB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
This item contains 2 files (8.9
MB).
Publicly Available
corpus
Description:
COLESLAW 1.0 is a large-scale collection of Slovenian legal texts compiled from authoritative public sources. The corpus covers legislative, judicial, and governmental legal documents and is designed to support research ...
This item contains 1 file (1.24
GB).
Publicly Available
corpus
Description:
The spoken corpus Gos 2.0 is the reference speech corpus of the Slovenian language. This second edition contains about 300 hours of speech, or 2.4 million words, 127 thousand utterances and 1,500 texts.
Gos 2.0 is ...
This item contains 2 files (60.07
MB).
Publicly Available