What's New
corpus

Description:
The dataset was created using a large number of Serbian Legislation texts gathered from the https://www.pravno-informacioni-sistem.rs/ website. The gathered texts were used for fine-tuning a neural network called SRBerta ...
Ta vnos vsebuje 5 datotek(e) (66.42
MB).
Publicly Available


toolService

Description:
The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).
The model ...
Ta vnos vsebuje 1 datoteko (387.44
MB).
Publicly Available



corpus

Description:
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, ...
Ta vnos vsebuje 2 datotek(e) (128.43
MB).
Publicly Available


Največ ogledov
V preteklem tednu
corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available


corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (23.37
GB).
Publicly Available


corpus

Description:
The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with some parts also containing further manually ...
Ta vnos vsebuje 2 datotek(e) (43.14
MB).
Publicly Available


