What's New
lexicalConceptualResource
Description:
This digital dictionary of papermaking was made on the basis of the printed edition, i.e.
Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU (https://doi.org/10.3986/961618220X).
It is an explanatory, ...
This item contains 3 files (3.84
MB).
Publicly Available
lexicalConceptualResource
Description:
The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to collocation meaning and/or form. Headwords and ...
This item contains 1 file (1.46
MB).
Publicly Available
corpus
Description:
The Tourism Corpus TURK 3.0 is a multilingual corpus of tourism-related texts in Slovenian, accompanied by some texts (about 6% of the corpus) in English, Italian and German. TURK 3.0 contains almost 1,460 texts or 20 ...
This item contains 3 files (820.52
MB).
Publicly Available
Most Viewed Items
Top Last Week
lexicalConceptualResource
Description:
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100.000 most frequent Slovenian lemmas, ...
This item contains 5 files (79.77
MB).
Publicly Available
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
This item contains 7 files (4.64
GB).
Publicly Available
corpus
Description:
The Croatian web corpus hrWaC was built by crawling the .hr top-level domain in 2011 and again in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated ...
This item contains 15 files (9.21
GB).
Publicly Available