What's New

 corpus 
corpus
Description:
EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content Analysis and Automated Report Generation ...
 This item contains 1 file (224.84 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial No Derivative Works
 corpus 
corpus
Description:
The FRENK dataset consists of comments to Facebook posts (news articles) of mainstream media outlets from Croatia, Great Britain, and Slovenia, on the topics of migrants and LGBT. The dataset contains whole discussion ...
 This item contains 1 file (4.17 MB).
 
Academic Use Inform Before Use Attribution Required Noncommercial
 corpus 
corpus
Description:
Maj68 corpus contains 874 texts published between 1964 and 1972 in the periodicals "Tribuna", "Problemi" and "Problemi. Literatura." The texts contain complete bibliographical data, are classified according to text and ...
 This item contains 5 files (786.02 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The SOFES speech database (Spoken Flight Enquiries in Slovene) is a collection of transcribed and segmented audio recordings of spoken flight-information enquiries in Slovene. SOFES is built on the basis of the GOPOLIS ...
 This item contains 3 files (1.4 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Description:
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate ...
 This item contains 16 files (49.38 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required Share Alike