Najnovejše

 lexicalConceptualResource 
lexicalConceptualResource
Opis:
Launched in December 2004 by the Domestic Research Society, Razvezani jezik (The Unleashed Tongue) is the first user-generated online dictionary of spoken Slovenian language. As a Wiki project, it allowed every visitor to ...
 Ta vnos vsebuje 1 datoteko (1.89 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 toolService 
toolService
Opis:
The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as contextually dependent word embeddings, used for ...
 Ta vnos vsebuje 4 datotek(e) (423.5 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 lexicalConceptualResource 
lexicalConceptualResource
Opis:
This entry consists of a TSV file containing a list of 66,347 Slovene word pairs from the Sloleks Morphological Lexicon of Slovene (v2.0; http://hdl.handle.net/11356/1230) that have been automatically identified as ...
 Ta vnos vsebuje 1 datoteko (2.83 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike

Največ ogledov

V preteklem tednu
 corpus 
corpus
Avtor(ji):
Opis:
The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: ...
 Ta vnos vsebuje 8 datotek(e) (616.88 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Opis:
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate ...
 Ta vnos vsebuje 16 datotek(e) (49.38 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Opis:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations ...
 Ta vnos vsebuje 1 datoteko (14.12 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike