Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2021-02-25T09:53:51Z
dc.date.available 2021-02-25T09:53:51Z
dc.date.issued 2021-02-24
dc.identifier.uri http://hdl.handle.net/11356/1404
dc.description The COPA-HR dataset (Choice of plausible alternatives in Croatian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1000 premises (My body cast a shadow over the grass), each given a question (What is the cause?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The observed agreement of the English annotator and the Croatian translator is perfect on the training and the validation dataset, with one different label (agreement of 99.8%) on the test dataset. The current state-of-the-art on this dataset is held by the BERTić model (https://huggingface.co/CLASSLA/bcms-bertic), achieving an accuracy of 66% (50% is random).
dc.language.iso hrv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://arxiv.org/abs/2104.09243
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.si/info/k-centre/
dc.subject commonsense reasoning
dc.subject manual annotation
dc.subject manual translation
dc.title Choice of plausible alternatives dataset in Croatian COPA-HR
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 1000 items
files.count 3
files.size 198859


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (194.2 KB)
Icon
Ime
train.jsonl
Velikost
78.5 KB
Format
Neznano
Opis
Training dataset
MD5
8245d3658540d70ea37113ec15a59ecf
 Prenesi datoteko
Icon
Ime
val.jsonl
Velikost
19.51 KB
Format
Neznano
Opis
Validation dataset
MD5
ea2b76a3c368f8c638009ae5682abdf5
 Prenesi datoteko
Icon
Ime
test.jsonl
Velikost
96.19 KB
Format
Neznano
Opis
Test dataset
MD5
0ce4f77204be59eca3c141f97e67823d
 Prenesi datoteko

Prikaži enostavni zapis vnosa