Choice of plausible alternatives dataset in Macedonian COPA-MK

Name: Choice of plausible alternatives dataset in Macedonian COPA-MK
License: https://creativecommons.org/licenses/by-sa/4.0/

Ljubešić, Nikola; Koloski, Boshko; Zdravkovska, Kristina; Kuzman, Taja

Prikaži enostavni zapis vnosa

dc.contributor.author	Ljubešić, Nikola
dc.contributor.author	Koloski, Boshko
dc.contributor.author	Zdravkovska, Kristina
dc.contributor.author	Kuzman, Taja
dc.date.accessioned	2022-10-22T12:13:05Z
dc.date.available	2022-10-22T12:13:05Z
dc.date.issued	2022-09-26
dc.identifier.uri	http://hdl.handle.net/11356/1687
dc.description	The COPA-MK dataset (Choice of plausible alternatives in Macedonian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The dataset follows the same format as the Croatian COPA-HR dataset (http://hdl.handle.net/11356/1404). It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files. Translation quality was ensured with the help of the ReLDI Centre Belgrade.
dc.language.iso	mkd
dc.publisher	Jožef Stefan Institute
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.source.uri	https://www.clarin.si/info/k-centre/
dc.subject	commonsense reasoning
dc.subject	manual annotation
dc.subject	manual translation
dc.title	Choice of plausible alternatives dataset in Macedonian COPA-MK
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor	Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor	Connecting Europe Facility (CEF) Telecom INEA/CEF/ICT/A2020/2278341 MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages Other
size.info	3 files
size.info	1000 items
size.info	258350 bytes
files.count	3
files.size	259292