What's New
lexicalConceptualResource
Description:
ArboSloleks is a dataset containing Slovene word formation trees that have been automatically constructed from word relations (http://hdl.handle.net/11356/1986) extracted from Sloleks 2.0 (http://hdl.handle.net/11356/1230). ...
This item contains 1 file (2.53
MB).
Publicly Available
corpus
Description:
This corpus consists of editions of three volumes of sermons written by Ignatius Holzapfel (1799-1866) when he was active as parish priest in Črnomelj and Ribnica. The bulk of Holzapfel's manuscript legacy remained ...
This item contains 1 file (278.19
KB).
Publicly Available
corpus
Description:
The document contains a diplomatic transcription of over 285 pages of manuscript documents about the Slovenian mystic Magdalena Gornik (1835-1896) from the village of Gora near Sodražica. The vast majority of the documents ...
This item contains 1 file (866.85
KB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available
corpus
Description:
The ssj500k training corpus is based on two training corpora built within the JOS project (https://nl.ijs.si/jos/). It contains the jos100k corpus and additional material from the jos1M corpus forming a training corpus ...
This item contains 3 files (17.7
MB).
Publicly Available
corpus
Description:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also ...
This item contains 3 files (91.53
MB).
Publicly Available