Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Koloski, Boshko
dc.contributor.author Zdravkovska, Kristina
dc.contributor.author Kuzman, Taja
dc.date.accessioned 2022-10-22T12:13:05Z
dc.date.available 2022-10-22T12:13:05Z
dc.date.issued 2022-09-26
dc.identifier.uri http://hdl.handle.net/11356/1687
dc.description The COPA-MK dataset (Choice of plausible alternatives in Macedonian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The dataset follows the same format as the Croatian COPA-HR dataset (http://hdl.handle.net/11356/1404). It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files. Translation quality was ensured with the help of the ReLDI Centre Belgrade.
dc.language.iso mkd
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.si/info/k-centre/
dc.subject commonsense reasoning
dc.subject manual annotation
dc.subject manual translation
dc.title Choice of plausible alternatives dataset in Macedonian COPA-MK
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Connecting Europe Facility (CEF) Telecom INEA/CEF/ICT/A2020/2278341 MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages Other
size.info 3 files
size.info 1000 items
size.info 258350 bytes
files.count 3
files.size 259292


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (253.21 KB)
Icon
Ime
train.jsonl
Velikost
103.31 KB
Format
Neznano
Opis
Training dataset
MD5
d7577c3804a32edf7169f5f060afa6e4
 Prenesi datoteko
Icon
Ime
val.jsonl
Velikost
25.47 KB
Format
Neznano
Opis
Validation dataset
MD5
dcfcdad1cabb3e2ee08415e4d460d62e
 Prenesi datoteko
Icon
Ime
test.jsonl
Velikost
124.43 KB
Format
Neznano
Opis
Test dataset
MD5
cc6011a17a24c1e8f233aeeb797620d5
 Prenesi datoteko

Prikaži enostavni zapis vnosa