dc.contributor.author |
Pahor de Maiti Tekavčič, Kristina |
dc.date.accessioned |
2025-05-13T09:56:40Z |
dc.date.available |
2025-05-13T09:56:40Z |
dc.date.issued |
2025-05-09 |
dc.identifier.uri |
http://hdl.handle.net/11356/2030 |
dc.description |
The Frenk-MRW dataset contains French and Slovene socially unacceptable Facebook comments that are manually annotated for metaphor and metonymy based on the observed incongruity between the basic and contextual meaning. The comments were posted between 2015 and 2017 under Facebook posts produced by major news media outlets on the topics of LGBTQIA+/homophobia and migration/islamophobia. This entry includes the dataset divided into four files in CSV format, two with French comments (metadata: meta_fr, metaphor/metonymy annotations: mrw_fr) and two with Slovene comments (metadata: meta_sl, metaphor/metonymy annotations: mrw_sl). Attached are also annotation guidelines and a README file explaining the file structure, both formatted as TXT files.
The dataset uses a selection of Slovene socially unacceptable comments from FRENK 1.1 (http://hdl.handle.net/11356/1462) and French socially unacceptable comments from FRENK-fr 1.0 (http://hdl.handle.net/11356/1947). French data from FRENK-fr 1.0 was linguistically annotated with the FreeLing tagger (https://aclanthology.org/L12-1224/), while Slovene data from FRENK 1.1 was processed using CLASSLA tagger (http://hdl.handle.net/11356/1337). Manual annotations were performed in a WebAnno deployment (webanno.github.io/webanno) hosted at CLARIN.SI.
FRENK-MRW represent a set of comments, 2,000 in total, that is based on a selection of news items (POST_CONTENT (NEWS) column) which were chosen according to two criteria: (1) for ease of annotation and interpretation, the entire thread of comments needed to be included (excluding acceptable comments from the annotation), and (2) the total amount of available comments linked to these news posts had to reach 2,000 comments equally distributed between the two languages (French, Slovene) and the two topics (migrants, LGBT). The French part of the dataset includes posts from Le Figaro and 20 minutes, with LGBT-related news coming only from the latter. In the Slovene part, the posts on both topics (migrants and LGBT) come from Nova24TV, Siol.net and 24ur.
There are 2,000 comments in the dataset with 84,738 tokens. Not all comments contain metaphors. In the French part, 541 comments contain at least one metaphorically used token, while in the Slovene part of the dataset this number amounts to 571 comments. In total, there are 1,192 metaphorically used tokens in the French part of the dataset, and 1,270 in the Slovene part. |
dc.language.iso |
eng |
dc.language.iso |
fra |
dc.language.iso |
slv |
dc.publisher |
Faculty of Arts, University of Ljubljana |
dc.publisher |
Institute of Contemporary History |
dc.publisher |
CY Cergy Paris University |
dc.rights |
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0 |
dc.rights.uri |
https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0 |
dc.rights.label |
ACA |
dc.subject |
offensive language |
dc.subject |
hate speech |
dc.subject |
user comment |
dc.subject |
social media |
dc.subject |
metaphor |
dc.subject |
metonymy |
dc.title |
French and Slovene offensive language metaphor and metonymy annotated dataset FRENK-MRW 1.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Kristina Pahor de Maiti Tekavčič kristina.pahordemaiti@ff.uni-lj.si Faculty of Arts, University of Ljubljana |
sponsor |
Slovenian Research Agency (ARIS) P6-0436 Digital humanities: resources, tools and methods nationalFunds |
sponsor |
CY Cergy Paris University (Paris Seine Initiative), EU (EUTOPIA) 22IAGOD744 The linguistic landscape of hateful discourse online in France and Slovenia Other |
sponsor |
Slovenian Research Agency (ARIS) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds |
sponsor |
Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
size.info |
84738 tokens |
size.info |
2000 texts |
files.count |
1 |
files.size |
1906250 |