What's New

 corpus 
corpus
Description:
The Twitter-HBS dataset consists of Twitter users, their tweets, and the label of their predominantly used language - Bosnian, Croatian, Montenegrin, or Serbian. Among the tweets, there are also tweets in other languages ...
 This item contains 1 file (12.98 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
The SETimes.HBS dataset consists of parallel documents written in Bosnian, Croatian and Serbian, harvested from the already inactive setimes.com website publishing news in the languages of South-Eastern Europe. While the ...
 This item contains 1 file (20.15 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
This corpus collects and annotates the extensive and highly valuable diachronic collection of Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of Slovenian Ethnology. It is composed of the ...
 This item contains 3 files (21.65 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required

Most Viewed Items

Top Last Week
 lexicalConceptualResource 
lexicalConceptualResource
Description:
A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ...
 This item contains 3 files (93.95 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike
 corpus 
corpus
Description:
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate ...
 This item contains 16 files (49.38 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike