dc.contributor.author | Štravs, Miha |
dc.contributor.author | Knez, Timotej |
dc.contributor.author | Žitnik, Slavko |
dc.date.accessioned | 2022-11-25T22:28:25Z |
dc.date.available | 2022-11-25T22:28:25Z |
dc.date.issued | 2022-09-15 |
dc.identifier.uri | http://hdl.handle.net/11356/1730 |
dc.description | The SloREL corpus contains annotations for training relation extraction models on Slovene documents. It contains documents from Slovene Wikipedia with annotated entities and relations. We constructed the annotations using a semi-supervised process based on linking the documents to the WikiData knowledge graph. The corpus contains 244,437 sentences from Slovene Wikipedia pages. We also provide 896 additional sentences collected from the 24ur.com news website with annotated and linked entities, which do not contain annotated relations and are meant for additional testing of the models. The entities in our corpus are linked to the entities in the WikiData knowledge graph which is useful for models that take advantage of additional knowledge from a knowledge graph. Altogether the corpus comprises 245,333 sentences with 813,952 relations and 1,616,193 entities. The corpus comprises of multiple documents: - schema-definition.xsd: defines the structure of the xml documents containing relation annotations. - SloREL/train.xml: training portion of the SloREL corpus containing Wikipedia documents - SloREL/test.xml: testing portion of the SloREL corpus containing Wikipedia documents - SloREL/validation.xml: validation portion of the SloREL corpus containing Wikipedia documents - 24ur/24ur.xml: additional sentences from the 24ur.com news articles Changes in version 1.1: We fixed the mislabeled relation types that were present in the previous version of the dataset. We also rearranged the archive to make the structure more understandable. |
dc.language.iso | slv |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.relation.isreferencedby | http://hdl.handle.net/20.500.12556/RUL-138295 |
dc.relation.replaces | http://hdl.handle.net/11356/1685 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | Wikipedia |
dc.subject | semi-supervised |
dc.subject | semantic relations |
dc.title | Slovene corpus for general relation extraction SloREL 1.1 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
size.info | 245333 sentences |
size.info | 40594288 bytes |
files.count | 1 |
files.size | 40594288 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- SloREL.zip
- Size
- 38.71 MB
- Format
- application/zip
- Description
- SloREL corpus
- MD5
- 024abbf45a37451c7f4543a5a6e18fa4