dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Rozman, Tadeja |
dc.contributor.author | Stritar Kučuk, Mojca |
dc.contributor.author | Krek, Simon |
dc.contributor.author | Krapš Vodopivec, Irena |
dc.contributor.author | Stabej, Marko |
dc.contributor.author | Pori, Eva |
dc.contributor.author | Goli, Teja |
dc.contributor.author | Lavrič, Polona |
dc.contributor.author | Laskowski, Cyprian |
dc.contributor.author | Kocjančič, Polonca |
dc.contributor.author | Klemenc, Bojan |
dc.contributor.author | Krsnik, Luka |
dc.contributor.author | Žagar, Aleš |
dc.contributor.author | Kosem, Iztok |
dc.date.accessioned | 2022-11-21T12:01:18Z |
dc.date.available | 2022-11-21T12:01:18Z |
dc.date.issued | 2022-11-21 |
dc.identifier.uri | http://hdl.handle.net/11356/1716 |
dc.description | The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems in these sentences. Teacher corrections are categorised into 180 types, using a hierarchically structured system of labels described in the attached document (in Slovenian). Every entry is equipped with corresponding metadata, such as the type of the source text, the educational stage of the author, and the type and the region of the school, where the text was produced (see README for more information). The data is exported from the Šolar 3.0 corpus (http://hdl.handle.net/11356/1589). The purpose of the dataset is to facilitate easier access for didactical purposes, statistical analyses of language problems in Slovenian primary and secondary education, and machine learning purposes. |
dc.language.iso | slv |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.cjvt.si/prop/en/ |
dc.subject | error annotation |
dc.subject | student writing |
dc.subject | teacher corrections |
dc.subject | language didactics |
dc.title | Frequency list of language problems from Šolar 3.0 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | other |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Špela Arhar Holdt arharhs@ff.uni-lj.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | ARRS J7-3159 Empirical foundations for digitally-supported development of writing skills nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
size.info | 36570 entries |
files.count | 3 |
files.size | 16195629 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (15.45 MB)To je vnos
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Ime
- Frequency-list-of-language-problems-from-Solar-3.0.tsv
- Velikost
- 14.61 MB
- Format
- Neznano
- Opis
- Dataset in TSV
- MD5
- f19b3e606d0715bb2d8685d97ef9a973

- Ime
- README.txt
- Velikost
- 2.22 KB
- Format
- Besedilna datoteka
- Opis
- Information on the dataset in TXT
- MD5
- 8b226231bd3d27fa83c8579a52155f34
*************** SLO: Podatkovni niz vsebuje povedi z jezikovnimi napakami in popravljene povedi, kakor tudi dodatne informacije o značilnostih izvornega besedila. Za več informacij gl. vnos korpusa Šolar 3.0 na repozitoriju in priložene označevalne smernice. ENG: The dataset comprises sentences with language errors and corresponding corrected sentences, together with additional information on the text features. Please refer to the original corpus dataset and the annotaion guidelines for detailed information. Arhar Holdt, Špela; et al., 2022, Developmental corpus Šolar 3.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1589. *************** "ID_besedila_s": SLO: ID izvornega besedila v korpusu Šolar 3.0. ENG: An ID of the source text in the Šolar 3.0 corpus. "ID_odstavka_s": SLO: ID izvornega odstavka v korpusu Šolar 3.0. ENG: An ID of the source paragraph in the Šolar 3.0 corpus. "ID_stavka_s": SLO: ID izvorne povedi v korpusu Šolar . . .

- Ime
- Smernice-za-oznacevanje-korpusa-Solar_V1.1.pdf
- Velikost
- 856.9 KB
- Format
- Opis
- Error annotation guidelines (in Slovenian)
- MD5
- c8b8b68fd1be51e1edadb7dd249b3ab4