What's New

 corpus 
corpus
Description:
The Twitter-HBS dataset consists of Twitter users, their tweets, and the label of their predominantly used language - Bosnian, Croatian, Montenegrin, or Serbian. Among the tweets, there are also tweets in other languages ...
 Ta vnos vsebuje 1 datoteko (12.98 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
The SETimes.HBS dataset consists of parallel documents written in Bosnian, Croatian and Serbian, harvested from the already inactive setimes.com website publishing news in the languages of South-Eastern Europe. While the ...
 Ta vnos vsebuje 1 datoteko (20.15 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
This corpus collects and annotates the extensive and highly valuable diachronic collection of Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of Slovenian Ethnology. It is composed of the ...
 Ta vnos vsebuje 3 datotek(e) (21.65 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required

Največ ogledov

V preteklem tednu
 lexicalConceptualResource 
lexicalConceptualResource
Description:
A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ...
 Ta vnos vsebuje 3 datotek(e) (93.95 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate ...
 Ta vnos vsebuje 16 datotek(e) (49.38 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike