What's New

 lexicalConceptualResource 
lexicalConceptualResource
Description:
Launched in December 2004 by the Domestic Research Society, Razvezani jezik (The Unleashed Tongue) is the first user-generated online dictionary of spoken Slovenian language. As a Wiki project, it allowed every visitor to ...
 This item contains 1 file (1.89 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 toolService 
toolService
Description:
The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as contextually dependent word embeddings, used for ...
 This item contains 4 files (423.5 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 lexicalConceptualResource 
lexicalConceptualResource
Description:
This entry consists of a TSV file containing a list of 66,347 Slovene word pairs from the Sloleks Morphological Lexicon of Slovene (v2.0; http://hdl.handle.net/11356/1230) that have been automatically identified as ...
 This item contains 1 file (2.83 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting at the end of 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in ...
 This item contains 9 files (5.12 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate ...
 This item contains 16 files (49.38 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations ...
 This item contains 1 file (14.12 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike