Show simple item record

 
dc.contributor.author Evkoski, Bojan
dc.contributor.author Pelicon, Andraž
dc.contributor.author Mozetič, Igor
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Kralj Novak, Petra
dc.date.accessioned 2021-07-21T17:02:01Z
dc.date.available 2021-07-21T17:02:01Z
dc.date.issued 2021-07-20
dc.identifier.uri http://hdl.handle.net/11356/1423
dc.description The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels (acceptable, inappropriate, offensive, violent) with https://huggingface.co/IMSyPP/hate_speech_slo. The dataset is the basis for the two following papers: - "Retweet communities reveal the main source of hate speech" - https://arxiv.org/pdf/2105.14898.pdf - "Community evolution in retweet networks" - https://arxiv.org/pdf/2105.06214.pdf
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://arxiv.org/pdf/2105.14898.pdf
dc.relation.isreferencedby https://arxiv.org/pdf/2105.06214.pdf
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://imsypp.ijs.si
dc.subject Twitter
dc.subject hate speech
dc.subject retweet networks
dc.title Slovenian Twitter dataset 2018-2020 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor European Union’s Rights,Equality and Citizenship Programme 875263 IMSyPP - Innovative Monitoring Systems and PreventionPolicies of Online Hate Speech Other
sponsor ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 12961136 texts
files.count 1
files.size 190882288


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
clarin_plos.zip
Size
182.04 MB
Format
application/zip
Description
Dataset in CSV format
MD5
58a693968c40b81b4cf483265e918a6a
 Download file  Preview
 File Preview  
    • README.txt843 B
    • clarin_plos_15072021.csv609 MB

Show simple item record