dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Koloski, Boshko |
dc.contributor.author | Zdravkovska, Kristina |
dc.contributor.author | Kuzman, Taja |
dc.date.accessioned | 2022-10-22T12:13:05Z |
dc.date.available | 2022-10-22T12:13:05Z |
dc.date.issued | 2022-09-26 |
dc.identifier.uri | http://hdl.handle.net/11356/1687 |
dc.description | The COPA-MK dataset (Choice of plausible alternatives in Macedonian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The dataset follows the same format as the Croatian COPA-HR dataset (http://hdl.handle.net/11356/1404). It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files. Translation quality was ensured with the help of the ReLDI Centre Belgrade. |
dc.language.iso | mkd |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.clarin.si/info/k-centre/ |
dc.subject | commonsense reasoning |
dc.subject | manual annotation |
dc.subject | manual translation |
dc.title | Choice of plausible alternatives dataset in Macedonian COPA-MK |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | Connecting Europe Facility (CEF) Telecom INEA/CEF/ICT/A2020/2278341 MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages Other |
size.info | 3 files |
size.info | 1000 items |
size.info | 258350 bytes |
files.count | 3 |
files.size | 259292 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (253.21 KB)To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Ime
- train.jsonl
- Velikost
- 103.31 KB
- Format
- Neznano
- Opis
- Training dataset
- MD5
- d7577c3804a32edf7169f5f060afa6e4

- Ime
- val.jsonl
- Velikost
- 25.47 KB
- Format
- Neznano
- Opis
- Validation dataset
- MD5
- dcfcdad1cabb3e2ee08415e4d460d62e

- Ime
- test.jsonl
- Velikost
- 124.43 KB
- Format
- Neznano
- Opis
- Test dataset
- MD5
- cc6011a17a24c1e8f233aeeb797620d5