<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href='static/style.xsl' type='text/xsl'?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-21T22:17:39Z</responseDate><request verb="GetRecord" identifier="oai:www.clarin.si:11356/1450" metadataPrefix="oai_dc">http://www.clarin.si/repository/oai/request</request><GetRecord><record><header><identifier>oai:www.clarin.si:11356/1450</identifier><datestamp>2023-03-27T17:01:18Z</datestamp><setSpec>hdl_11356_1023</setSpec><setSpec>hdl_11356_1024</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Italian YouTube Hate Speech Corpus</dc:title>
<dc:creator>Cinelli, Matteo</dc:creator>
<dc:creator>Pelicon, Andraž</dc:creator>
<dc:creator>Mozetič, Igor</dc:creator>
<dc:creator>Quattrociocchi, Walter</dc:creator>
<dc:creator>Kralj Novak, Petra</dc:creator>
<dc:creator>Zollo, Fabiana</dc:creator>
<dc:subject>hate speech</dc:subject>
<dc:subject>misinformation</dc:subject>
<dc:subject>YouTube</dc:subject>
<dc:description>We present an Italian YouTube dataset manually annotated for hate speech types and targets. The comments to be annotated were sampled from the Italian YouTube comments on videos about the Covid-19 pandemic in the period from January 2020 to May 2020. Two sets were annotated: a training set with 59,870 comments (IMSyPP_IT_YouTube_comments_train.csv) and an evaluation set with 10,536 comments (IMSyPP_IT_YouTube_comments_evaluation.csv). The dataset was annotated by 8 annotators with each comment being annotated by two annotators. It was used to train a classification model for hate speech types detection that is publicly available at the following URL: https://huggingface.co/IMSyPP/hate_speech_it.&#xd;
&#xd;
The dataset consists of the following fields:&#xd;
ID_Commento - YouTube ID of the comment&#xd;
ID_Video - YouTube ID of the video under which the comment was posted&#xd;
Testo - text of the comment&#xd;
Tipo - type of hate speech&#xd;
Target - the target of hate speech&#xd;
&#xd;
&#xd;
Additionally, we have included the Italian YouTube data (SR_YT_comments.csv) which was collected in the same period as the training data and was annotated using the aforementioned model. The automatically labeled data was used to analyze the relationship between hate speech and misinformation on Italian YouTube. The results of this analysis are presented in the associated paper.&#xd;
&#xd;
The analyzed data are represented with the following fields:&#xd;
ID_Commento - YouTube ID of the comment&#xd;
Label - automatically assigned label by the model&#xd;
is_questionable - the type of channel where the comment was collected from; the channels could either be categorized as spreading reliable or questionable information.</dc:description>
<dc:date>2021-10-01</dc:date>
<dc:type>corpus</dc:type>
<dc:identifier>http://hdl.handle.net/11356/1450</dc:identifier>
<dc:language>ita</dc:language>
<dc:relation>https://arxiv.org/abs/2105.14005</dc:relation>
<dc:rights>Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
<dc:rights>PUB</dc:rights>
<dc:format>text/plain; charset=utf-8</dc:format>
<dc:format>text/csv</dc:format>
<dc:format>text/csv</dc:format>
<dc:format>text/csv</dc:format>
<dc:format>downloadable_files_count: 3</dc:format>
<dc:publisher>Jožef Stefan Institute</dc:publisher>
<dc:source>http://imsypp.ijs.si/</dc:source>
</oai_dc:dc>
</metadata></record></GetRecord></OAI-PMH>