Slovene translation of the SQuAD2.0 dataset

Name: Slovene translation of the SQuAD2.0 dataset
License: https://creativecommons.org/licenses/by/4.0/

Borovič, Mladen; Žagar, Kristjan; Ferme, Marko; Majninger, Sandi; Ojsteršek, Milan; Šmajdek, Uroš; Zirkelbach, Maj; Zupanič, Matjaž; Jazbinšek, Meta; Žitnik, Slavko; Robnik-Šikonja, Marko

dc.contributor.author	Borovič, Mladen
dc.contributor.author	Žagar, Kristjan
dc.contributor.author	Ferme, Marko
dc.contributor.author	Majninger, Sandi
dc.contributor.author	Ojsteršek, Milan
dc.contributor.author	Šmajdek, Uroš
dc.contributor.author	Zirkelbach, Maj
dc.contributor.author	Zupanič, Matjaž
dc.contributor.author	Jazbinšek, Meta
dc.contributor.author	Žitnik, Slavko
dc.contributor.author	Robnik-Šikonja, Marko
dc.date.accessioned	2023-01-28T17:54:39Z
dc.date.available	2023-01-28T17:54:39Z
dc.date.issued	2022-09-22
dc.identifier.uri	http://hdl.handle.net/11356/1756
dc.description	Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. The English version of SQuAD2.0 was machine translated to Slovene, then the translation was manually reviewed and corrected where needed. The data is provided in JSON format and consists of a training set and a validation set.
dc.language.iso	slv
dc.publisher	Faculty of Electrical Engineering and Computer Science, University of Maribor
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.publisher	Faculty of Arts, University of Ljubljana
dc.relation.isreferencedby	https://rajpurkar.github.io/SQuAD-explorer/
dc.rights	Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.label	PUB
dc.source.uri	https://rsdo.slovenscina.eu/
dc.subject	Q&A
dc.subject	SQuAD
dc.subject	natural language processing
dc.title	Slovene translation of the SQuAD2.0 dataset
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
demo.uri	https://slovenscina.eu/odgovarjanje-na-vprasanja
contact.person	Mladen Borovič mladen.borovic@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor
sponsor	Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count	2
files.size	134666349