Show simple item record

 
dc.contributor.author Mozetič, Igor
dc.contributor.author Grčar, Miha
dc.contributor.author Smailović, Jasmina
dc.date.accessioned 2016-02-23T10:08:53Z
dc.date.available 2016-04-25T21:45:18Z
dc.date.issued 2016-02-23
dc.identifier.uri http://hdl.handle.net/11356/1054
dc.description The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator agreement, or to study the differences between language usage on Twitter. The data analysis is described in the following papers: I. Mozetič, M. Grčar, J. Smailović. Multilingual Twitter sentiment classification: The role of human annotators, PLoS ONE 11(5): e0155036, doi: 10.1371/journal.pone.e0155036, 2016. (http://dx.doi.org/10.1371/journal.pone.0155036) I. Mozetič, L. Torgo, V. Cerqueira, J. Smailović. How to evaluate sentiment classifiers for Twitter time-ordered data?, PLoS ONE 13(3): e0194317, doi: 10.1371/journal.pone.0194317, 2018. (https://dx.doi.org/10.1371/journal.pone.0194317)
dc.language.iso sqi
dc.language.iso bos
dc.language.iso bul
dc.language.iso hrv
dc.language.iso eng
dc.language.iso deu
dc.language.iso hun
dc.language.iso pol
dc.language.iso por
dc.language.iso srp
dc.language.iso rus
dc.language.iso slk
dc.language.iso slv
dc.language.iso spa
dc.language.iso swe
dc.publisher Jožef Stefan Institute
dc.relation info:eu-repo/grantAgreement/EC/FP7/610704
dc.relation info:eu-repo/grantAgreement/EC/FP7/317532
dc.relation info:eu-repo/grantAgreement/EC/H2020/640772
dc.relation.isreferencedby https://dx.doi.org/10.1371/journal.pone.0155036
dc.relation.isreferencedby https://dx.doi.org/10.1371/journal.pone.0194317
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.subject sentiment classification
dc.subject Twitter
dc.subject inter-annotator agreement
dc.subject annotator self-agreement
dc.subject multilingual
dc.title Twitter sentiment for 15 European languages
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Igor Mozetic igor.mozetic@ijs.si Jožef Stefan Institute
sponsor EC 610704 SIMPOL euFunds info:eu-repo/grantAgreement/EC/FP7/610704
sponsor EC 317532 MULTIPLEX euFunds info:eu-repo/grantAgreement/EC/FP7/317532
sponsor EC 640772 DOLFINS euFunds info:eu-repo/grantAgreement/EC/H2020/640772
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 1643735 items
files.count 16
files.size 51781021


 Files in this item

 Download all files in item (49.38 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
README.txt
Size
665 bytes
Format
Text file
Description
Unknown
MD5
8b2d34f643b73d8a44557dc8d2ba6d2f
 Download file  Preview
 File Preview  
There are 15 files for the corresponding 15 European languages:
Albanian, Bosnian, Bulgarian, Croatian, English, German, Hungarian,
Polish, Portuguese, Russian, Serbian, Slovak, Slovenian, Spanish, and Swedish.

Files are in the standard csv format, each line has the following form:
TweetID,HandLabel,AnnotatorID

TweetID is assigned by Twitter and can be used to retreive the tweet.
HandLabel is the sentimen label as assigned by the human annotator
(Negative, Neutral, or Positive).
AnnotatorID is a 3-digit integer assigned to anonymous annotators,
and can be used to identify tweets annotated several times by the
same or by different annotators. . . .
                                            
Icon
Name
German_Twitter_sentiment.csv
Size
3.27 MB
Format
Unknown
Description
CSV file
MD5
b6b766a80454a928ce0b90211dd60bab
 Download file
Icon
Name
English_Twitter_sentiment.csv
Size
3.1 MB
Format
Unknown
Description
CSV file
MD5
8407a2302f20336a8809ba74f6d0112a
 Download file
Icon
Name
Croatian_Twitter_sentiment.csv
Size
2.95 MB
Format
Unknown
Description
CSV file
MD5
d28be685aa56adf237a3d59e6043ddd7
 Download file
Icon
Name
Bulgarian_Twitter_sentiment.csv
Size
2.02 MB
Format
Unknown
Description
CSV file
MD5
c1248f10e9130b70036a994a11c44018
 Download file
Icon
Name
Albanian_Twitter_sentiment.csv
Size
1.65 MB
Format
Unknown
Description
CSV file
MD5
aaebf885a823be2e941a7bf58e1aeb5b
 Download file
Icon
Name
Russian_Twitter_sentiment.csv
Size
3.14 MB
Format
Unknown
Description
CSV file
MD5
199a3dd666abbb193d5541aac35eb9d6
 Download file
Icon
Name
Bosnian_Twitter_sentiment.csv
Size
1.35 MB
Format
Unknown
Description
CSV file
MD5
8dcfbeb77c8ae28b1f3211831705e14a
 Download file
Icon
Name
Portuguese_Twitter_sentiment.csv
Size
4.6 MB
Format
Unknown
Description
CSV file
MD5
446fe4c9be94b69b419cc8c81aea284e
 Download file
Icon
Name
Polish_Twitter_sentiment.csv
Size
6.77 MB
Format
Unknown
Description
CSV file
MD5
647396610cce71dda658924111a3833b
 Download file
Icon
Name
Hungarian_Twitter_sentiment.csv
Size
2.07 MB
Format
Unknown
Description
CSV file
MD5
cb7301ef7a528cf4360c8a7303d5d723
 Download file
Icon
Name
Swedish_Twitter_sentiment.csv
Size
1.77 MB
Format
Unknown
Description
CSV file
MD5
7e7f6885784b4e195bcf02a25d4881dd
 Download file
Icon
Name
Spanish_Twitter_sentiment.csv
Size
8.31 MB
Format
Unknown
Description
CSV file
MD5
8b0a56106e3764d17787eecc502484f3
 Download file
Icon
Name
Slovenian_Twitter_sentiment.csv
Size
4.03 MB
Format
Unknown
Description
CSV file
MD5
5717253537f2241c80be514ed135c612
 Download file
Icon
Name
Slovak_Twitter_sentiment.csv
Size
2.14 MB
Format
Unknown
Description
CSV file
MD5
8889f24b7fbf7662324612440fa8d723
 Download file
Icon
Name
Serbian_Twitter_sentiment.csv
Size
2.22 MB
Format
Unknown
Description
CSV file
MD5
faeefe277de414e78e27ef679864c0e9
 Download file

Show simple item record