dc.contributor.author | Borovič, Mladen |
dc.contributor.author | Žagar, Kristjan |
dc.contributor.author | Ferme, Marko |
dc.contributor.author | Majninger, Sandi |
dc.contributor.author | Ojsteršek, Milan |
dc.contributor.author | Šmajdek, Uroš |
dc.contributor.author | Zirkelbach, Maj |
dc.contributor.author | Zupanič, Matjaž |
dc.contributor.author | Jazbinšek, Meta |
dc.contributor.author | Žitnik, Slavko |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.date.accessioned | 2023-01-28T17:54:39Z |
dc.date.available | 2023-01-28T17:54:39Z |
dc.date.issued | 2022-09-22 |
dc.identifier.uri | http://hdl.handle.net/11356/1756 |
dc.description | Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. The English version of SQuAD2.0 was machine translated to Slovene, then the translation was manually reviewed and corrected where needed. The data is provided in JSON format and consists of a training set and a validation set. |
dc.language.iso | slv |
dc.publisher | Faculty of Electrical Engineering and Computer Science, University of Maribor |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.publisher | Faculty of Arts, University of Ljubljana |
dc.relation.isreferencedby | https://rajpurkar.github.io/SQuAD-explorer/ |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://rsdo.slovenscina.eu/ |
dc.subject | Q&A |
dc.subject | SQuAD |
dc.subject | natural language processing |
dc.title | Slovene translation of the SQuAD2.0 dataset |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://slovenscina.eu/odgovarjanje-na-vprasanja |
contact.person | Mladen Borovič mladen.borovic@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor |
sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
files.count | 2 |
files.size | 134666349 |
Files in this item
Download all files in item (128.43 MB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- squad2-slo-mt-dev.json
- Size
- 14.02 MB
- Format
- Unknown
- Description
- SQuAD2.0-SLO-MT validation set
- MD5
- f2fd18434814cb67023376089b5b8a60

- Name
- squad2-slo-mt-train.json
- Size
- 114.41 MB
- Format
- Unknown
- Description
- SQuAD2.0-SLO-MT training set
- MD5
- 4beaa280c4145b8b875116cc99a470a9