What's New
corpus
Description:
COLESLAW 1.0 is a large-scale collection of Slovenian legal texts compiled from authoritative public sources. The corpus covers legislative, judicial, and governmental legal documents and is designed to support research ...
Ta vnos vsebuje 1 datoteko (1.24
GB).
Publicly Available
corpus
Description:
This is a large-scale multilingual benchmark for evaluating metalinguistic knowledge (i.e. explicit knowledge about the structure of languages) in large language models using grammatical features from the World Atlas of ...
Ta vnos vsebuje 1 datoteko (1.75
MB).
Publicly Available
corpus
Description:
The Corpus-grounded evaluation dataset for grammatical question answering (GramQA) consists of 13 grammatical questions inspired by WALS, the World Atlas of Language Structures (https://wals.info/), focusing on word order ...
Ta vnos vsebuje 1 datoteko (38.04
KB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available
corpus
Description:
The CLASSLA-web 2.0 collection is a large-scale, comparable set of web corpora covering all seven South Slavic languages: Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian. This second major ...
Ta vnos vsebuje 29 datotek(e) (455.27
GB).
Publicly Available
corpus
Description:
goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899.
Each text contains extensive meta-data and per-page ...
Ta vnos vsebuje 2 datotek(e) (8.9
MB).
Publicly Available