French and Slovene offensive language metaphor and metonymy annotated dataset FRENK-MRW 1.0

Name: French and Slovene offensive language metaphor and metonymy annotated dataset FRENK-MRW 1.0
License: https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0

Pahor de Maiti Tekavčič, Kristina

Show simple item record

dc.contributor.author	Pahor de Maiti Tekavčič, Kristina
dc.date.accessioned	2025-05-13T09:56:40Z
dc.date.available	2025-05-13T09:56:40Z
dc.date.issued	2025-05-09
dc.identifier.uri	http://hdl.handle.net/11356/2030
dc.description	The Frenk-MRW dataset contains French and Slovene socially unacceptable Facebook comments that are manually annotated for metaphor and metonymy based on the observed incongruity between the basic and contextual meaning. The comments were posted between 2015 and 2017 under Facebook posts produced by major news media outlets on the topics of LGBTQIA+/homophobia and migration/islamophobia. This entry includes the dataset divided into four files in CSV format, two with French comments (metadata: meta_fr, metaphor/metonymy annotations: mrw_fr) and two with Slovene comments (metadata: meta_sl, metaphor/metonymy annotations: mrw_sl). Attached are also annotation guidelines and a README file explaining the file structure, both formatted as TXT files. The dataset uses a selection of Slovene socially unacceptable comments from FRENK 1.1 (http://hdl.handle.net/11356/1462) and French socially unacceptable comments from FRENK-fr 1.0 (http://hdl.handle.net/11356/1947). French data from FRENK-fr 1.0 was linguistically annotated with the FreeLing tagger (https://aclanthology.org/L12-1224/), while Slovene data from FRENK 1.1 was processed using CLASSLA tagger (http://hdl.handle.net/11356/1337). Manual annotations were performed in a WebAnno deployment (webanno.github.io/webanno) hosted at CLARIN.SI. FRENK-MRW represent a set of comments, 2,000 in total, that is based on a selection of news items (POST_CONTENT (NEWS) column) which were chosen according to two criteria: (1) for ease of annotation and interpretation, the entire thread of comments needed to be included (excluding acceptable comments from the annotation), and (2) the total amount of available comments linked to these news posts had to reach 2,000 comments equally distributed between the two languages (French, Slovene) and the two topics (migrants, LGBT). The French part of the dataset includes posts from Le Figaro and 20 minutes, with LGBT-related news coming only from the latter. In the Slovene part, the posts on both topics (migrants and LGBT) come from Nova24TV, Siol.net and 24ur. There are 2,000 comments in the dataset with 84,738 tokens. Not all comments contain metaphors. In the French part, 541 comments contain at least one metaphorically used token, while in the Slovene part of the dataset this number amounts to 571 comments. In total, there are 1,192 metaphorically used tokens in the French part of the dataset, and 1,270 in the Slovene part.
dc.language.iso	eng
dc.language.iso	fra
dc.language.iso	slv
dc.publisher	Faculty of Arts, University of Ljubljana
dc.publisher	Institute of Contemporary History
dc.publisher	CY Cergy Paris University
dc.rights	CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri	https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label	ACA
dc.subject	offensive language
dc.subject	hate speech
dc.subject	user comment
dc.subject	social media
dc.subject	metaphor
dc.subject	metonymy
dc.title	French and Slovene offensive language metaphor and metonymy annotated dataset FRENK-MRW 1.0
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Kristina Pahor de Maiti Tekavčič kristina.pahordemaiti@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor	Slovenian Research Agency (ARIS) P6-0436 Digital humanities: resources, tools and methods nationalFunds
sponsor	CY Cergy Paris University (Paris Seine Initiative), EU (EUTOPIA) 22IAGOD744 The linguistic landscape of hateful discourse online in France and Slovenia Other
sponsor	Slovenian Research Agency (ARIS) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
sponsor	Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info	84738 tokens
size.info	2000 texts
files.count	1
files.size	1906250

Files in this item

This item is

Academic Use

and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0

Name: frenk-mrw.zip
Size: 1.82 MB
Format: application/zip
Description: README, annotation guidelines, CSV metadata and data
MD5: 35fe363e51534bb5302e46206687de97

Download file

Show simple item record

Files in this item

Partners

Partners

Repository