What's New

 corpus 
corpus
Description:
The Slovene Web genre identification corpus GINCO 1.0 contains web texts, manually annotated with genre, from two Slovene web corpora, the slWaC 2.0 corpus, crawled in 2014, and a web corpus, crawled in 2021 in the scope ...
 This item contains 2 files (1.77 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 lexicalConceptualResource 
lexicalConceptualResource
Description:
The resource contains several datasets containing domain-specific data in three languages, English, Slovenian and Croatian, which can be used for various knowledge extraction or knowledge modelling tasks. The resource ...
 This item contains 1 file (1.25 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 lexicalConceptualResource 
lexicalConceptualResource
Description:
2,060 recordings in mp3 format were made for the School Dictionary of the Slovenian Language based on the original recordings in wav format (48 kHZ, 24-bit). Around 600 recordings were made at the Institute of Ethnomusicology, ...
 This item contains 1 file (54.56 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations ...
 This item contains 1 file (14.12 MB).
 
Academic Use Attribution Required Noncommercial