What's New

 lexicalConceptualResource 
lexicalConceptualResource
Description:
This digital dictionary of papermaking was made on the basis of the printed edition, i.e. Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU (https://doi.org/10.3986/961618220X). It is an explanatory, ...
 This item contains 3 files (3.84 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 lexicalConceptualResource 
lexicalConceptualResource
Description:
The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to collocation meaning and/or form. Headwords and ...
 This item contains 1 file (1.46 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
The Tourism Corpus TURK 3.0 is a multilingual corpus of tourism-related texts in Slovenian, accompanied by some texts (about 6% of the corpus) in English, Italian and German. TURK 3.0 contains almost 1,460 texts or 20 ...
 This item contains 3 files (820.52 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...
 This item contains 7 files (4.64 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 lexicalConceptualResource 
lexicalConceptualResource
Description:
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens), and consists of 127 textbooks from 16 different ...
 This item contains 1 file (864.93 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
The Croatian web corpus hrWaC was built by crawling the .hr top-level domain in 2011 and again in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated ...
 This item contains 15 files (9.21 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike