Najnovejše

 corpus 
corpus
Opis:
A hand-labeled training (50,000 tweets labeled twice) and evaluation set (10,000 tweets labeled twice) for hate speech on Slovenian Twitter. The data files contain tweet IDs, hate speech type, hate speech target, and ...
 Ta vnos vsebuje 4 datotek(e) (5.19 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Avtor(ji):
Opis:
The COPA-HR dataset (Choice of plausible alternatives in Croatian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology ...
 Ta vnos vsebuje 3 datotek(e) (194.2 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 toolService 
toolService
Opis:
The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as contextually dependent word embeddings, used for ...
 Ta vnos vsebuje 2 datotek(e) (1.29 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike

Največ ogledov

V preteklem tednu
 corpus 
corpus
Avtor(ji):
Opis:
The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: ...
 Ta vnos vsebuje 8 datotek(e) (616.88 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Opis:
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,7 billion tokens) written 2000 - 2018 and gathered from the digital ...
 Ta vnos vsebuje 6 datotek(e) (42.11 GB).
 
Academic Use Inform Before Use Attribution Required Noncommercial
 lexicalConceptualResource 
lexicalConceptualResource
Opis:
The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of the word; (2) the lemma, the base-form of the ...
 Ta vnos vsebuje 12 datotek(e) (16.27 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike