What's New
corpus

Description:
The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema ...
Ta vnos vsebuje 8 datotek(e) (7.43
MB).
Publicly Available



toolService

Description:
The inflectional data lookup module serves as an optional component within the cordex library (https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. The module consists of a pickled ...
Ta vnos vsebuje 1 datoteko (31.44
MB).
Publicly Available




toolService

Description:
This is a collection of modular teaching and learning content created in the UPSKILLS project ( UPgrading the SKIlls of Linguistics and Language Students) and downloaded from the Moodle platform in .mbz format. The learning ...
Ta vnos vsebuje 13 datotek(e) (1.94
GB).
Publicly Available


Največ ogledov
V preteklem tednu
corpus

Description:
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities.
The ...
Ta vnos vsebuje 3 datotek(e) (10.91
MB).
Publicly Available



corpus

Description:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also ...
Ta vnos vsebuje 3 datotek(e) (91.53
MB).
Publicly Available



corpus

Description:
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size.
The ...
Ta vnos vsebuje 27 datotek(e) (5.22
GB).
Publicly Available

