Show simple item record

 
dc.contributor.author Gorenc, Sabina
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2022-11-23T09:36:26Z
dc.date.available 2022-11-23T09:36:26Z
dc.date.issued 2022-11-23
dc.identifier.uri http://hdl.handle.net/11356/1682
dc.description To increase the accessibility and diversity of easy reading in Slovenian and to create a prototype system that automatically simplifies texts in Slovenian, we prepared a dataset for the Slovenian language that contains aligned simple and complex sentences, which can be used for further development of models for simplifying texts in Slovenian. Dataset is a .json file that usually contains one complex ("kompleksni") and one simplified sentence ("enostavni") per row. However, if a complex sentence contains a lot of information we translated this sentence into more than one simplified sentences. Vice versa, more complex sentences can be translated into one simplified sentence if some information is given through more than one complex sentences but we summarised them into one simplified one.
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation.isreferencedby https://github.com/sabina-skubic/text-simplification-slovene/tree/main/master-thesis
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/sabina-skubic/text-simplification-slovene
dc.subject text simplification
dc.subject monolingual corpus
dc.title Slovene text simplification dataset SloTS
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/sabina-skubic/text-simplification-slovene/tree/main/example
contact.person Sabina Gorenc sabina.skubic@gmail.com University of Ljubljana
size.info 973 entries
files.count 1
files.size 186255


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
textSimplification.json
Size
181.89 KB
Format
Unknown
Description
Parallel slovene sentences for text simplification
MD5
c9ffb28e0ea3d66940b87c851e13f9ea
 Download file

Show simple item record