What's New
toolService

Description:
This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation ...
Ta vnos vsebuje 1 datoteko (142.95
MB).
Publicly Available
corpus

Description:
The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema ...
Ta vnos vsebuje 8 datotek(e) (7.43
MB).
Publicly Available



toolService

Description:
The inflectional data lookup module serves as an optional component within the cordex library (https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. The module consists of a pickled ...
Ta vnos vsebuje 1 datoteko (31.44
MB).
Publicly Available




Največ ogledov
V preteklem tednu
corpus

Description:
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities.
The ...
Ta vnos vsebuje 3 datotek(e) (10.91
MB).
Publicly Available



corpus

Description:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also ...
Ta vnos vsebuje 3 datotek(e) (91.53
MB).
Publicly Available



corpus

Description:
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size.
The ...
Ta vnos vsebuje 27 datotek(e) (5.22
GB).
Publicly Available

