dc.contributor.author | Kosem, Iztok |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Stritar Kučuk, Mojca |
dc.contributor.author | Krek, Simon |
dc.contributor.author | Krapš Vodopivec, Irena |
dc.contributor.author | Stabej, Marko |
dc.contributor.author | Pori, Eva |
dc.contributor.author | Goli, Teja |
dc.contributor.author | Lavrič, Polona |
dc.contributor.author | Laskowski, Cyprian |
dc.contributor.author | Kocjančič, Polonca |
dc.contributor.author | Klemenc, Bojan |
dc.contributor.author | Rozman, Tadeja |
dc.date.accessioned | 2019-11-08T07:59:08Z |
dc.date.available | 2019-11-08T07:59:08Z |
dc.date.issued | 2019-07-08 |
dc.identifier.uri | http://hdl.handle.net/11356/1214 |
dc.description | The Developmental corpus Šolar 2.0 consists of 5,485 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15), with a small percentage also from the 6th grade. School essays form the majority of the corpus while other material includes texts created during lessons, such as text recapitulations or descriptions, examples of formal applications etc. Most of the texts were produced at the subject of the Slovenian language. Part of the corpus (2,094 texts) is annotated with teachers' corrections using a system of labels described in the attached document (in Slovene). Teacher corrections were part of the original files and reflect real classroom situations of essay marking. Corrections were then inserted into texts by annotators, and subsequently categorized. This corpus also exists in two derived versions, Šolar Clear (http://hdl.handle.net/11356/1219), which contains only the text of the students without the teacher corrections, and Šolar Error (http://hdl.handle.net/11356/1231), which contains only those sentecens that have teacher corrections. |
dc.language.iso | slv |
dc.publisher | Trojina, Institute for Applied Slovene Studies |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.relation.replaces | http://hdl.handle.net/11356/1036 |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1589 |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.cjvt.si/raziskovalno-delo/projekti-cjvt/korpus-solar/ |
dc.subject | developmental corpus |
dc.subject | error annotation |
dc.subject | student writing |
dc.title | Developmental corpus Šolar 2.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Iztok Kosem iztok.kosem@ff.uni-lj.si Centre for Language Resources and Technologies, University of Ljubljana |
sponsor | Ministry of Culture 3340-15-141006 Upgrade of Šolar Corpus nationalFunds |
sponsor | ARRS (Slovenian Research Agency) I0-0051 Centre for Applied Linguistics (CUJ) nationalFunds |
sponsor | Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other |
sponsor | University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds |
size.info | 5485 texts |
size.info | 1748858 tokens |
size.info | 1638229 words |
files.count | 2 |
files.size | 22747429 |
Files in this item
Download all files in item (21.69 MB)This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Name
- Solar2.0.zip
- Size
- 21 MB
- Format
- application/zip
- Description
- Corpus in XML format
- MD5
- 2080f7b29a5932d57bf59ea9c7e056d8

- Name
- Smernice za označevanje korpusa Šolar 2.0 (v1.0).pdf
- Size
- 714.09 KB
- Format
- Description
- Guidelines for corpus annotation
- MD5
- df44421fe80ec4efad1e8741fd5905e1