Show simple item record

 
dc.contributor.author Rei, Luis
dc.contributor.author Krek, Simon
dc.contributor.author Mladenić, Dunja
dc.date.accessioned 2016-11-28T13:47:36Z
dc.date.available 2016-11-28T13:47:36Z
dc.date.issued 2016-11-28
dc.identifier.uri http://hdl.handle.net/11356/1078
dc.description The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total, the corpus contains almost 20K annotated messages and 350K tokens. The corpus is described in Luis Rei, Dunja Mladenić, Simon Krek. A Multilingual Social Media Linguistic Corpus. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities. 27–28 September 2016, Ljubljana, Slovenia. https://nl.ijs.si/janes/cmc-corpora2016/proceedings/
dc.language.iso spa
dc.language.iso ita
dc.language.iso deu
dc.publisher Jožef Stefan Institute
dc.relation info:eu-repo/grantAgreement/EC/FP7/611346
dc.rights The MIT License (MIT)
dc.rights.uri https://opensource.org/licenses/mit-license.php
dc.rights.label PUB
dc.source.uri https://github.com/lrei/xlime_twitter_corpus
dc.subject social media
dc.subject computer-mediated communication
dc.subject Twitter
dc.subject part-of-speech tagging
dc.subject named entities
dc.subject sentiment classification
dc.subject multilingual
dc.subject manual annotation
dc.title xLiMe Twitter Corpus XTC 1.0.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CLARIN.SI data & tools
contact.person Luis Rei luis.rei@ijs.si Jožef Stefan Institute
sponsor ICT Programme FP7-ICT-611346 xLiMe euFunds info:eu-repo/grantAgreement/EC/FP7/611346
size.info 363994 tokens
size.info 19669 texts
files.count 2
files.size 6592396


 Files in this item

 Download all files in item (6.29 MB)
This item is
Publicly Available
and licensed under:
The MIT License (MIT)
Icon
Name
xlime_twitter_corpus-master.zip
Size
6.14 MB
Format
application/zip
Description
The full xLiMe Twitter Corpus data and code
MD5
a65651e185d92b7aa76de9f52a6aa442
 Download file  Preview
 File Preview  
  • xlime_twitter_corpus-master
    • corpus_task
      • spanish_pos.txt1 MB
      • german_ner.txt454 kB
      • german_pos.txt565 kB
      • german_sentiment.tsv422 kB
      • spanish_sentiment.tsv890 kB
      • italian_sentiment.tsv1 MB
      • italian_ner.txt1 MB
      • italian_pos.txt1 MB
      • spanish_ner.txt964 kB
    • README.md13 kB
    • code
      • __init__.py0 B
      • twokenize.py11 kB
      • stats_task.py1 kB
      • xlime2conll.py2 kB
      • extract_sentiment.py1 kB
      • data.py1 kB
      • agreement.py4 kB
      • xlime2iaa.py2 kB
      • stats.py2 kB
      • experiment.py4 kB
      • seq.py4 kB
      • pretag.py5 kB
    • requirements.txt25 B
    • data
      • italian_task_1442142987.tsv10 MB
      • german_task_1442142996.tsv4 MB
      • spanish_task_1440847551.tsv9 MB
    • agreement
      • german_sent.iaa1 kB
      • italian_ner.iaa9 kB
      • italian_pos.iaa22 kB
      • spanish_ner.iaa6 kB
      • german_ner.iaa8 kB
      • spanish_pos.iaa15 kB
      • german_pos.iaa17 kB
      • spanish_sent.iaa1 kB
      • italian_sent.iaa1 kB
    • guidelines.md8 kB
    • LICENSE.md1 kB
Icon
Name
CMC-2016_Rei_et_al_Multilingual-Social-Media-Linguistic-Corpus.pdf
Size
151.65 KB
Format
PDF
Description
Paper describing the corpus
MD5
d2e2a0b00a4d389f40b55f4972901a37
 Download file

Show simple item record