dc.contributor.author | Evkoski, Bojan |
dc.contributor.author | Pelicon, Andraž |
dc.contributor.author | Mozetič, Igor |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Kralj Novak, Petra |
dc.date.accessioned | 2021-07-21T17:02:01Z |
dc.date.available | 2021-07-21T17:02:01Z |
dc.date.issued | 2021-07-20 |
dc.identifier.uri | http://hdl.handle.net/11356/1423 |
dc.description | The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels (acceptable, inappropriate, offensive, violent) with https://huggingface.co/IMSyPP/hate_speech_slo. The dataset is the basis for the two following papers: - "Retweet communities reveal the main source of hate speech" - https://arxiv.org/pdf/2105.14898.pdf - "Community evolution in retweet networks" - https://arxiv.org/pdf/2105.06214.pdf |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://arxiv.org/pdf/2105.14898.pdf |
dc.relation.isreferencedby | https://arxiv.org/pdf/2105.06214.pdf |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://imsypp.ijs.si |
dc.subject | |
dc.subject | hate speech |
dc.subject | retweet networks |
dc.title | Slovenian Twitter dataset 2018-2020 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
sponsor | European Union’s Rights,Equality and Citizenship Programme 875263 IMSyPP - Innovative Monitoring Systems and PreventionPolicies of Online Hate Speech Other |
sponsor | ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
size.info | 12961136 texts |
files.count | 1 |
files.size | 190882288 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- clarin_plos.zip
- Size
- 182.04 MB
- Format
- application/zip
- Description
- Dataset in CSV format
- MD5
- 58a693968c40b81b4cf483265e918a6a