What's New
toolService

Description:
This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation ...
This item contains 1 file (142.95
MB).
Publicly Available
corpus

Description:
The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema ...
This item contains 8 files (7.43
MB).
Publicly Available



toolService

Description:
The inflectional data lookup module serves as an optional component within the cordex library (https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. The module consists of a pickled ...
This item contains 1 file (31.44
MB).
Publicly Available




Most Viewed Items
Top Last Week
corpus

Description:
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities.
The ...
This item contains 3 files (10.91
MB).
Publicly Available



corpus

Description:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also ...
This item contains 3 files (91.53
MB).
Publicly Available



corpus

Description:
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size.
The ...
This item contains 27 files (5.22
GB).
Publicly Available

