Show simple item record

 
dc.contributor.author Knez, Timotej
dc.contributor.author Žitnik, Slavko
dc.date.accessioned 2023-03-27T10:19:55Z
dc.date.available 2023-03-27T10:19:55Z
dc.date.issued 2023-03-23
dc.identifier.uri http://hdl.handle.net/11356/1781
dc.description The SloWIC dataset is a Slovenian dataset for the Word in Context task. Each example in the dataset contains a target word with multiple meanings and two sentences that both contain the target word. Each example is also annotated with a label that shows if both sentences use the same meaning of the target word. The dataset contains 1808 manually annotated sentence pairs and additional 13150 automatically annotated pairs to help with training larger models. The dataset is stored in the JSON format following the format used in the SuperGLUE version of the Word in Context task (https://super.gluebenchmark.com/). Each example contains the following data fields: - word: The target word with multiple meanings - sentence1: The first sentence containing the target word - sentence2: The second sentence containing the target word - idx: The index of the example in the dataset - label: Label showing if the sentences contain the same meaning of the target word - start1: Start of the target word in the first sentence - start2: Start of the target word in the second sentence - end1: End of the target word in the first sentence - end2: End of the target word in the second sentence - version: The version of the annotation - manual_annotation: Boolean showing if the label was manually annotated - group: The group of annotators that labelled the example
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.subject dataset
dc.subject word in context
dc.subject SuperGLUE
dc.title Slovenian Word in Context dataset SloWiC 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
size.info 14958 items
files.count 2
files.size 7687411


 Files in this item

 Download all files in item (7.33 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
SloWiC.json
Size
6.54 MB
Format
Unknown
Description
Entire SloWiC corpus
MD5
fa602a380e22097f901fcb8e21d1f826
 Download file
Icon
Name
SloWiC_manually_annotated.json
Size
806.26 KB
Format
Unknown
Description
Manually annotated part of the SloWiC corpus
MD5
8c00ca623cc87ac5f40e2d1785b6e807
 Download file

Show simple item record