Show simple item record

 
dc.contributor.author Mladenić Grobelnik, Adrian
dc.contributor.author Novak, Erik
dc.contributor.author Mladenić, Dunja
dc.contributor.author Grobelnik, Marko
dc.date.accessioned 2022-11-23T11:38:00Z
dc.date.available 2022-11-23T11:38:00Z
dc.date.issued 2022-11-23
dc.identifier.uri http://hdl.handle.net/11356/1724
dc.description The SloATOMIC 2020 corpus contains the Slovene translations of the ATOMIC 2020 data set, a commonsense knowledge graph with 1.33M everyday inferential knowledge tuples about entities and events. The translations were acquired using the DeepL translation service, where a selection of about 10k examples was also manually inspected and appropriately fixed. The corpus consists of 1.331.114 examples distributed across the train, validation, and test data sets. The corpus was created as part of work package 4 of the Slovene in the Digital Environment project. The corpus consists of the following files: - sloatomic_train.tsv: The training set. - sloatomic_dev.tsv: The validation set. - sloatomic_test.tsv.automatic_all: The test set containing all of the automatically translated examples. - sloatomic_test.tsv.automatic_10k: The selection of 10k examples from the complete test set. - sloatomic_test.tsv.manual_10k: The manually inspected and fixed examples of the automatic 10k subset. The data is in the tsv (tab-seperated) format. Each line contains one example. The columns are: - head_event: The head event of the example. - relation: The relation between the head event and the tail event. The relation can be one of the 23 different descriptors. - tail_event: The tail event of the example.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://ailab.ijs.si/dunja/SiKDD2022/Papers/SiKDD2022_paper_5674.pdf
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/E3-JSI/dataset-SloATOMIC-2020
dc.subject commonsense reasoning
dc.subject knowledge graph
dc.subject dataset
dc.title Slovene Translation of the Atomic 2020 data set SloATOMIC 2020
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Erik Novak erik.novak@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
size.info 1331114 entries
files.count 1
files.size 13216888


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
sloatomic2020.zip
Size
12.6 MB
Format
application/zip
Description
SloATOMIC 2020 corpus
MD5
19e9723933f89144daa012011a34ab9c
 Download file  Preview
 File Preview  
    • sloatomic_test.tsv.manual_10k504 kB
    • sloatomic_test.tsv.automatic_10k523 kB
    • sloatomic_dev.tsv5 MB
    • README.txt1 kB
    • sloatomic_train.tsv53 MB
    • sloatomic_test.tsv.automatic_all7 MB

Show simple item record