Show simple item record

 
dc.contributor.author Borovič, Mladen
dc.contributor.author Žagar, Kristjan
dc.contributor.author Ferme, Marko
dc.contributor.author Majninger, Sandi
dc.contributor.author Ojsteršek, Milan
dc.contributor.author Šmajdek, Uroš
dc.contributor.author Zirkelbach, Maj
dc.contributor.author Zupanič, Matjaž
dc.contributor.author Jazbinšek, Meta
dc.contributor.author Žitnik, Slavko
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2023-01-28T17:54:39Z
dc.date.available 2023-01-28T17:54:39Z
dc.date.issued 2022-09-22
dc.identifier.uri http://hdl.handle.net/11356/1756
dc.description Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. The English version of SQuAD2.0 was machine translated to Slovene, then the translation was manually reviewed and corrected where needed. The data is provided in JSON format and consists of a training set and a validation set.
dc.language.iso slv
dc.publisher Faculty of Electrical Engineering and Computer Science, University of Maribor
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.publisher Faculty of Arts, University of Ljubljana
dc.relation.isreferencedby https://rajpurkar.github.io/SQuAD-explorer/
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/
dc.subject dataset
dc.subject Q&A
dc.subject SQuAD
dc.subject natural language processing
dc.title Slovene translation of the SQuAD2.0 dataset
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://slovenscina.eu/odgovarjanje-na-vprasanja
contact.person Mladen Borovič mladen.borovic@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 2
files.size 134666349


 Files in this item

 Download all files in item (128.43 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
squad2-slo-mt-dev.json
Size
14.02 MB
Format
Unknown
Description
SQuAD2.0-SLO-MT validation set
MD5
f2fd18434814cb67023376089b5b8a60
 Download file
Icon
Name
squad2-slo-mt-train.json
Size
114.41 MB
Format
Unknown
Description
SQuAD2.0-SLO-MT training set
MD5
4beaa280c4145b8b875116cc99a470a9
 Download file

Show simple item record