Prikaži enostavni zapis vnosa

 
dc.contributor.author Klemen, Matej
dc.contributor.author Žagar, Aleš
dc.contributor.author Čibej, Jaka
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2024-03-22T13:32:55Z
dc.date.available 2024-03-22T13:32:55Z
dc.date.issued 2024-03-19
dc.identifier.uri http://hdl.handle.net/11356/1934
dc.description SI-NLI-en is an English translation of the SI-NLI Slovene Natural Language Inference Dataset (http://hdl.handle.net/11356/1707). The English version was compiled by first using machine translation (DeepL) to translate all the premises and hypotheses from SI-NLI into English. The machine translations were then manually checked and corrected by a group of 7 students of translation at the University of Ljubljana. Each translator was given both the Slovene premise and all its hypotheses as well as the translations of both the premise and the hypotheses, so the translations were not checked in isolation, but as units to ensure maximum semantic coherence. Just like SI-NLI, SI-NLI-en contains 5,937 sentence pairs (premise and hypothesis) that are manually labeled with the labels "entailment", "contradiction", and "neutral". The dataset is split into train, validation, and test sets, with sizes of 4,392, 547, and 998. The dataset is released in a tabular TSV format. The 00README.txt file contains a description of the attributes. Only the hypothesis and premise are provided in the test set (with no annotations) since SI-NLI-en is integrated into the Slovene evaluation framework SloBENCH (https://slobench.cjvt.si/). If you use the dataset to train your models, please consider submitting the test set predictions to SloBENCH to get the evaluation score and see how it compares to others.
dc.language.iso eng
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.subject natural language inference
dc.title English translation of the Slovene Natural Language Inference Dataset SI-NLI-en 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Klemen matej.klemen@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
contact.person Aleš Žagar ales.zagar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
contact.person Jaka Čibej jaka.cibej@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 5937 units
files.count 1
files.size 382500


 Datoteke v tem vnosu

Icon
Ime
SI-NLI-en_1.0.zip
Velikost
373.54 KB
Format
application/zip
Opis
SI-NLI-en 1.0 (ZIP)
MD5
5d3c5643e9401a4d13360f5396058ab4
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • SI-NLI-en_1.0
    • test.tsv209 kB
    • dev.tsv134 kB
    • train.tsv1 MB
    • 00README.txt624 B

Prikaži enostavni zapis vnosa