Multilingual dataset of COVID tweets for relation-level metaphor analysis TCMeta 1.0

Name: Multilingual dataset of COVID tweets for relation-level metaphor analysis TCMeta 1.0
License: https://creativecommons.org/licenses/by/4.0/

Brglez, Mojca; Zayed, Omnia; Buitelaar, Paul

Multilingual dataset of COVID tweets for relation-level metaphor analysis TCMeta 1.0

CLARIN.SI data & tools

Avtorji: Brglez, Mojca ; Zayed, Omnia and Buitelaar, Paul

Identifikator vnosa: http://hdl.handle.net/11356/1787

Dokumentirano v: https://doi.org/10.1007/s10579-024-09725-z

Datum objave: 2023-01-24

Vrsta: corpus, text

Velikost: 4359 entries

Jezik(i): English , Slovenian

Opis: TCMeta is a dataset of noun phrase constructions from COVID-related tweets, annotated for relation-level metaphor. It contains 2,138 Slovene and 2,221 English instances in tab-separated tabular format .tsv, where each line presents a unique phrase under consideration, extracted from a COVID-related tweet. The primary annotations include the COVID metaphor label (whether the phrase expresses a metaphor relating to COVID), but also additional ones for idioms, metaphors not relating to COVID, or metaphors not evident on the relation-level. The complete user tweet could not be published due to the ToS of the then Twitter platform. We recommend retrieving the text of the tweets via their IDs using the Hydrator tool [https://github.com/docnow/hydrator] or similar. The dataset is further described in: Brglez, M., Zayed, O. & Buitelaar, P. TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis. Lang Resources & Evaluation 59, 437–475 (2025). https://doi.org/10.1007/s10579-024-09725-z. @article{brglez2025tcmeta, title={{TCMeta}: a multilingual dataset of {COVID} tweets for relation-level metaphor analysis}, author={Brglez, Mojca and Zayed, Omnia and Buitelaar, Paul}, journal={Language Resources and Evaluation}, pages={437--475}, volume={59}, year={2025}, publisher={Springer}, doi = {10.1007/s10579-024-09725-z} }

Izdajatelj: Faculty of Arts, University of Ljubljana

Ključne besede: metaphor Twitter social media COVID-19 manual annotation

Zbirke: CLARIN.SI data & tools

Druge različice

Prikaži polni zapis vnosa

Datoteke v tem vnosu

Prenesi vse datoteke v vnosu (228.99 KB)

To je vnos

Publicly Available

z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Ime: README.txt
Velikost: 2.39 KB
Format: Besedilna datoteka
Opis: description
MD5: 425c9b3f580725daba40877c2d73eecc

Prenesi datoteko Predogled

Predogled datoteke

TCMeta is a multilingual dataset of COVID tweets for relation-level metaphor analysis.

It contains 2,138 Slovene and 2,221 English noun phrase constructions extracted from COVID-related tweets that are annotated for relation-level metaphor.


The data is in tab-separated tabular format .tsv. Each line presents a unique phrase, extracted from a COVID-related tweet. 

The primary annotations can be found in the column "COVID metaphor label" (whether the phrase expresses a metaphor relating to COVID). Additional annotations can be found in the "Comments" column, and include annotations of idioms, metaphors not relating to COVID, and metaphors not evident on the relation-level.


The data contains the following columns:

Language		the language of the tweet, 'sl' (Slovene) or 'en' (English) 
Tweet ID		the unique identifier of the tweet, which can be used to retrieve the text of the post
Phrase			the phrase extracted from the tweet 
COVID metaphor label	'y' (Yes) or 'n' (No): whether it is . . .

Ime: TCMeta.v1.tsv
Velikost: 226.6 KB
Format: Neznano
Opis: dataset
MD5: a61a7c336d66d87f3822be638a744cb9

Prenesi datoteko

Multilingual dataset of COVID tweets for relation-level metaphor analysis TCMeta 1.0

Datoteke v tem vnosu

Partnerji

Partnerji

Repozitorij