dc.contributor.author | Gorenc, Sabina |
dc.contributor.author | Robnik-Šikonja, Marko |
dc.date.accessioned | 2022-11-23T09:36:26Z |
dc.date.available | 2022-11-23T09:36:26Z |
dc.date.issued | 2022-11-23 |
dc.identifier.uri | http://hdl.handle.net/11356/1682 |
dc.description | To increase the accessibility and diversity of easy reading in Slovenian and to create a prototype system that automatically simplifies texts in Slovenian, we prepared a dataset for the Slovenian language that contains aligned simple and complex sentences, which can be used for further development of models for simplifying texts in Slovenian. Dataset is a .json file that usually contains one complex ("kompleksni") and one simplified sentence ("enostavni") per row. However, if a complex sentence contains a lot of information we translated this sentence into more than one simplified sentences. Vice versa, more complex sentences can be translated into one simplified sentence if some information is given through more than one complex sentences but we summarised them into one simplified one. |
dc.language.iso | slv |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.relation.isreferencedby | https://github.com/sabina-skubic/text-simplification-slovene/tree/main/master-thesis |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/sabina-skubic/text-simplification-slovene |
dc.subject | text simplification |
dc.subject | monolingual corpus |
dc.title | Slovene text simplification dataset SloTS |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://github.com/sabina-skubic/text-simplification-slovene/tree/main/example |
contact.person | Sabina Gorenc sabina.skubic@gmail.com University of Ljubljana |
size.info | 973 entries |
files.count | 1 |
files.size | 186255 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- textSimplification.json
- Size
- 181.89 KB
- Format
- Unknown
- Description
- Parallel slovene sentences for text simplification
- MD5
- c9ffb28e0ea3d66940b87c851e13f9ea