Show simple item record

 
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2021-02-25T09:53:51Z
dc.date.available 2021-02-25T09:53:51Z
dc.date.issued 2021-02-24
dc.identifier.uri http://hdl.handle.net/11356/1404
dc.description The COPA-HR dataset (Choice of plausible alternatives in Croatian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1000 premises (My body cast a shadow over the grass), each given a question (What is the cause?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The observed agreement of the English annotator and the Croatian translator is perfect on the training and the validation dataset, with one different label (agreement of 99.8%) on the test dataset. The current state-of-the-art on this dataset is held by the BERTić model (https://huggingface.co/CLASSLA/bcms-bertic), achieving an accuracy of 66% (50% is random).
dc.language.iso hrv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://arxiv.org/abs/2104.09243
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.si/info/k-centre/
dc.subject commonsense reasoning
dc.subject manual annotation
dc.subject manual translation
dc.title Choice of plausible alternatives dataset in Croatian COPA-HR
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 1000 items
files.count 3
files.size 198859


 Files in this item

 Download all files in item (194.2 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
train.jsonl
Size
78.5 KB
Format
Unknown
Description
Training dataset
MD5
8245d3658540d70ea37113ec15a59ecf
 Download file
Icon
Name
val.jsonl
Size
19.51 KB
Format
Unknown
Description
Validation dataset
MD5
ea2b76a3c368f8c638009ae5682abdf5
 Download file
Icon
Name
test.jsonl
Size
96.19 KB
Format
Unknown
Description
Test dataset
MD5
0ce4f77204be59eca3c141f97e67823d
 Download file

Show simple item record