Slovenian Word in Context dataset SloWiC 1.0

Name: Slovenian Word in Context dataset SloWiC 1.0
License: https://creativecommons.org/licenses/by-sa/4.0/

Knez, Timotej; Žitnik, Slavko

dc.contributor.author	Knez, Timotej
dc.contributor.author	Žitnik, Slavko
dc.date.accessioned	2023-03-27T10:19:55Z
dc.date.available	2023-03-27T10:19:55Z
dc.date.issued	2023-03-23
dc.identifier.uri	http://hdl.handle.net/11356/1781
dc.description	The SloWIC dataset is a Slovenian dataset for the Word in Context task. Each example in the dataset contains a target word with multiple meanings and two sentences that both contain the target word. Each example is also annotated with a label that shows if both sentences use the same meaning of the target word. The dataset contains 1808 manually annotated sentence pairs and additional 13150 automatically annotated pairs to help with training larger models. The dataset is stored in the JSON format following the format used in the SuperGLUE version of the Word in Context task (https://super.gluebenchmark.com/). Each example contains the following data fields: - word: The target word with multiple meanings - sentence1: The first sentence containing the target word - sentence2: The second sentence containing the target word - idx: The index of the example in the dataset - label: Label showing if the sentences contain the same meaning of the target word - start1: Start of the target word in the first sentence - start2: Start of the target word in the second sentence - end1: End of the target word in the first sentence - end2: End of the target word in the second sentence - version: The version of the annotation - manual_annotation: Boolean showing if the label was manually annotated - group: The group of annotators that labelled the example
dc.language.iso	slv
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.subject	word in context
dc.subject	SuperGLUE
dc.title	Slovenian Word in Context dataset SloWiC 1.0
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
size.info	14958 items
files.count	2
files.size	7687411