The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels (acceptable, inappropriate, offensive, violent) with https://huggingface.co/IMSyPP/hate_speech_slo.
The dataset is the basis for the two following papers:
- "Retweet communities reveal the main source of hate speech" - https://arxiv.org/pdf/2105.14898.pdf
- "Community evolution in retweet networks" - https://arxiv.org/pdf/2105.06214.pdf
European Union’s Rights,Equality and Citizenship Programme875263"IMSyPP - Innovative Monitoring Systems and PreventionPolicies of Online Hate Speech"ARRS (Slovenian Research Agency)N6-0099"LiLaH: Linguistic Landscape of Hate Speech"ARRS (Slovenian Research Agency)P6-0411"Language Resources and Technologies for Slovene"ARRS (Slovenian Research Agency)P2-103"Knowledge Technologies"