Show simple item record

 
dc.contributor.author Pegan, Jasmina
dc.contributor.author Robnik-Šikonja, Marko
dc.contributor.author Kosem, Iztok
dc.contributor.author Gantar, Polona
dc.contributor.author Ponikvar, Primož
dc.contributor.author Laskowski, Cyprian
dc.date.accessioned 2022-10-26T08:33:12Z
dc.date.available 2022-10-26T08:33:12Z
dc.date.issued 2022-10-26
dc.identifier.uri http://hdl.handle.net/11356/1694
dc.description Slovenian datasets for contextual synonym and antonym detection can be used for training machine learning classifiers as described in the MSc thesis of Jasmina Pegan "Semantic detection of synonyms and antonyms with contextual embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456). Datasets contain example pairs of synonyms and antonyms in contexts together with additional information on a sense pair. Candidates for synonyms and antonyms were retrieved from the dataset created in the BSc thesis of Jasmina Pegan "Antonym detection with word embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=110533). Example sentences were retrieved from The comprehensive Slovenian-Hungarian dictionary (VSMS) (https://www.clarin.si/repository/xmlui/handle/11356/1453). Each dataset is class balanced and contains an equal amount of examples and counterexamples. An example is a pair of example sentences where the two words are synonyms/antonyms. A counterexample is a pair of example sentences where two words are not synonyms/antonyms. Note that a word pair can be synonymous or antonymous in some sense of the two words (but not in the given context). Datasets are divided into two categories, datasets for synonyms and datasets for antonyms. Each category is further divided into base and updated datasets. These contain three dataset files: train, validation and test dataset. Base datasets include only manually-reviewed sense pairs. These are generated from all pairs of VSMS sense examples for all confirmed pairs of antonym and synonym senses. Updated datasets include automatically generated sense pairs while constraining the maximal number of examples per word. In this way, the dataset is more balanced word-wise, but is not fully manually-reviewed and contains less accurate data. A single dataset entry contains the information on the base word, followed by data on synonym/antonym candidate. The last column discerns whether the sense pair is a pair of synonyms/antonyms or not. More details on this can be found inside the included README file.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/jasminapegan/antonym_detection
dc.subject synonyms
dc.subject antonyms
dc.subject dataset
dc.subject WSD
dc.subject WSI
dc.title Slovenian datasets for contextual synonym and antonym detection
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jasmina Pegan jasmina.pegan.fri@gmail.com Faculty of Computer and Information Science, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Republic of Slovenia, Ministry of Culture 3340-21-722002 Upgrading fundamental dictionary resources and databases of CJVT UL nationalFunds
size.info 79229 entries
files.count 1
files.size 4939201


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
datasets.zip
Size
4.71 MB
Format
application/zip
Description
Dataset files
MD5
e1496af81fc71a19c8d23042f9f36c2e
 Download file  Preview
 File Preview  

Show simple item record