Show simple item record

 
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Rozman, Tadeja
dc.contributor.author Stritar Kučuk, Mojca
dc.contributor.author Krek, Simon
dc.contributor.author Krapš Vodopivec, Irena
dc.contributor.author Stabej, Marko
dc.contributor.author Pori, Eva
dc.contributor.author Goli, Teja
dc.contributor.author Lavrič, Polona
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Kocjančič, Polonca
dc.contributor.author Klemenc, Bojan
dc.contributor.author Krsnik, Luka
dc.contributor.author Žagar, Aleš
dc.contributor.author Kosem, Iztok
dc.date.accessioned 2022-11-21T12:01:18Z
dc.date.available 2022-11-21T12:01:18Z
dc.date.issued 2022-11-21
dc.identifier.uri http://hdl.handle.net/11356/1716
dc.description The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems in these sentences. Teacher corrections are categorised into 180 types, using a hierarchically structured system of labels described in the attached document (in Slovenian). Every entry is equipped with corresponding metadata, such as the type of the source text, the educational stage of the author, and the type and the region of the school, where the text was produced (see README for more information). The data is exported from the Šolar 3.0 corpus (http://hdl.handle.net/11356/1589). The purpose of the dataset is to facilitate easier access for didactical purposes, statistical analyses of language problems in Slovenian primary and secondary education, and machine learning purposes.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/prop/en/
dc.subject error annotation
dc.subject student writing
dc.subject teacher corrections
dc.subject language didactics
dc.title Frequency list of language problems from Šolar 3.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Špela Arhar Holdt arharhs@ff.uni-lj.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor ARRS J7-3159 Empirical foundations for digitally-supported development of writing skills nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 36570 entries
files.count 3
files.size 16195629


 Files in this item

 Download all files in item (15.45 MB)
Icon
Name
Frequency-list-of-language-problems-from-Solar-3.0.tsv
Size
14.61 MB
Format
Unknown
Description
Dataset in TSV
MD5
f19b3e606d0715bb2d8685d97ef9a973
 Download file
Icon
Name
README.txt
Size
2.22 KB
Format
Text file
Description
Information on the dataset in TXT
MD5
8b226231bd3d27fa83c8579a52155f34
 Download file  Preview
 File Preview  
***************

SLO: Podatkovni niz vsebuje povedi z jezikovnimi napakami in popravljene povedi, kakor tudi dodatne informacije o značilnostih izvornega besedila. Za več informacij gl. vnos korpusa Šolar 3.0 na repozitoriju in priložene označevalne smernice.
ENG: The dataset comprises sentences with language errors and corresponding corrected sentences, together with additional information on the text features. Please refer to the original corpus dataset and the annotaion guidelines for detailed information.

Arhar Holdt, Špela; et al., 2022, Developmental corpus Šolar 3.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1589.

***************

"ID_besedila_s": SLO: ID izvornega besedila v korpusu Šolar 3.0. ENG: An ID of the source text in the Šolar 3.0 corpus.

"ID_odstavka_s": SLO: ID izvornega odstavka v korpusu Šolar 3.0. ENG: An ID of the source paragraph in the Šolar 3.0 corpus.

"ID_stavka_s": SLO: ID izvorne povedi v korpusu Šolar . . .
                                            
Icon
Name
Smernice-za-oznacevanje-korpusa-Solar_V1.1.pdf
Size
856.9 KB
Format
PDF
Description
Error annotation guidelines (in Slovenian)
MD5
c8b8b68fd1be51e1edadb7dd249b3ab4
 Download file

Show simple item record