Show simple item record

 
dc.contributor.author Štravs, Miha
dc.contributor.author Knez, Timotej
dc.contributor.author Žitnik, Slavko
dc.date.accessioned 2022-09-21T09:11:35Z
dc.date.available 2022-09-21T09:11:35Z
dc.date.issued 2022-09-15
dc.identifier.uri http://hdl.handle.net/11356/1685
dc.description The SloREL corpus contains annotations for training relation extraction models on Slovene documents. It contains documents from Slovene Wikipedia with annotated entities and relations. We constructed the annotations using a semi-supervised process based on linking the documents to the WikiData knowledge graph. The corpus contains 244,437 sentences from Slovene Wikipedia pages. We also provide 896 additional sentences collected from the 24ur.com news website with annotated and linked entities, which do not contain annotated relations and are meant for additional testing of the models. The entities in our corpus are linked to the entities in the WikiData knowledge graph which is useful for models that take advantage of additional knowledge from a knowledge graph. All together the corpus comprises 245,333 sentences with 813,952 relations and 1,616,193 entities. The corpus comprises of multiple documents: - schema-definition.xsd: defines the structure of the xml documents containing relation annotations. - wikipedia-train.xml: training portion of the wikipedia corpus - wikipedia-test.xml: testing portion of the wikipedia corpus - wikipedia-validation.xml: validation portion of the wikipedia corpus - 24ur.xml: additional sentences from the 24ur.com news articles
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation.isreferencedby http://hdl.handle.net/20.500.12556/RUL-138295
dc.relation.isreplacedby http://hdl.handle.net/11356/1730
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject Wikipedia
dc.subject semi-supervised
dc.subject semantic relations
dc.title Slovene corpus for general relation extraction SloREL 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
size.info 245333 sentences
size.info 41673426 bytes
files.count 1
files.size 41673426


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
SloREL.zip
Size
39.74 MB
Format
application/zip
Description
SloREL corpus
MD5
c6b8df362fdd5465cb936db555ff0575
 Download file  Preview
 File Preview  
  • SloREL
    • 24ur.xml300 kB
    • wikipedia-test.xml10 MB
    • wikipedia-validation.xml20 MB
    • wikipedia-train.xml168 MB
    • schema-definition.xsd1 kB

Show simple item record