What's New
lexicalConceptualResource
Description:
This digital dictionary of papermaking was made on the basis of the printed edition, i.e.
Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU (https://doi.org/10.3986/961618220X).
It is an explanatory, ...
This item contains 3 files (3.84
MB).
Publicly Available
lexicalConceptualResource
Description:
The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to collocation meaning and/or form. Headwords and ...
This item contains 1 file (1.46
MB).
Publicly Available
corpus
Description:
The Tourism Corpus TURK 3.0 is a multilingual corpus of tourism-related texts in Slovenian, accompanied by some texts (about 6% of the corpus) in English, Italian and German. TURK 3.0 contains almost 1,460 texts or 20 ...
This item contains 3 files (820.52
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
This item contains 7 files (4.64
GB).
Publicly Available
lexicalConceptualResource
Description:
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens), and consists of 127 textbooks from 16 different ...
This item contains 1 file (864.93
KB).
Publicly Available
corpus
Description:
The Croatian web corpus hrWaC was built by crawling the .hr top-level domain in 2011 and again in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated ...
This item contains 15 files (9.21
GB).
Publicly Available