Show simple item record

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Starović, Mirjana
dc.contributor.author Kuzman, Taja
dc.contributor.author Samardžić, Tanja
dc.date.accessioned 2022-11-15T08:54:16Z
dc.date.available 2022-11-15T08:54:16Z
dc.date.issued 2022-11-15
dc.identifier.uri http://hdl.handle.net/11356/1708
dc.description The COPA-SR dataset (Choice of plausible alternatives in Serbian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The dataset follows the same format as the Croatian COPA-HR dataset (http://hdl.handle.net/11356/1404) and Macedonian COPA-MK dataset (http://hdl.handle.net/11356/1687). It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files. Translation of the dataset was performed by the ReLDI Centre Belgrade (https://reldi.spur.uzh.ch/).
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.si/info/k-centre/
dc.subject commonsense reasoning
dc.subject manual annotation
dc.subject manual translation
dc.title Choice of plausible alternatives dataset in Serbian COPA-SR
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Connecting Europe Facility (CEF) Telecom INEA/CEF/ICT/A2020/2278341 MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages Other
size.info 3 files
size.info 1000 items
size.info 258048 bytes
files.count 3
files.size 249311


 Files in this item

 Download all files in item (243.47 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
train.jsonl
Size
98.42 KB
Format
Unknown
Description
Training dataset
MD5
eaf214a706aa5a1c23c6d48cf1eac7aa
 Download file
Icon
Name
val.jsonl
Size
24.45 KB
Format
Unknown
Description
Validation dataset
MD5
41c4fd55fc303555dc04328cba87d78a
 Download file
Icon
Name
test.jsonl
Size
120.6 KB
Format
Unknown
Description
Test dataset
MD5
86d5255b3c2ca0d1659412ea598b366a
 Download file

Show simple item record