Show simple item record

 
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Stritar Kučuk, Mojca
dc.contributor.author Krek, Simon
dc.contributor.author Krapš Vodopivec, Irena
dc.contributor.author Stabej, Marko
dc.contributor.author Pori, Eva
dc.contributor.author Goli, Teja
dc.contributor.author Lavrič, Polona
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Kocjančič, Polonca
dc.contributor.author Klemenc, Bojan
dc.contributor.author Rozman, Tadeja
dc.date.accessioned 2019-11-08T07:59:08Z
dc.date.available 2019-11-08T07:59:08Z
dc.date.issued 2019-07-08
dc.identifier.uri http://hdl.handle.net/11356/1214
dc.description The Developmental corpus Šolar 2.0 consists of 5,485 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15), with a small percentage also from the 6th grade. School essays form the majority of the corpus while other material includes texts created during lessons, such as text recapitulations or descriptions, examples of formal applications etc. Most of the texts were produced at the subject of the Slovenian language. Part of the corpus (2,094 texts) is annotated with teachers' corrections using a system of labels described in the attached document (in Slovene). Teacher corrections were part of the original files and reflect real classroom situations of essay marking. Corrections were then inserted into texts by annotators, and subsequently categorized. This corpus also exists in two derived versions, Šolar Clear (http://hdl.handle.net/11356/1219), which contains only the text of the students without the teacher corrections, and Šolar Error (http://hdl.handle.net/11356/1231), which contains only those sentecens that have teacher corrections.
dc.language.iso slv
dc.publisher Trojina, Institute for Applied Slovene Studies
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.replaces http://hdl.handle.net/11356/1036
dc.relation.isreplacedby http://hdl.handle.net/11356/1589
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/raziskovalno-delo/projekti-cjvt/korpus-solar/
dc.subject developmental corpus
dc.subject error annotation
dc.subject student writing
dc.title Developmental corpus Šolar 2.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Iztok Kosem iztok.kosem@ff.uni-lj.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor Ministry of Culture 3340-15-141006 Upgrade of Šolar Corpus nationalFunds
sponsor ARRS (Slovenian Research Agency) I0-0051 Centre for Applied Linguistics (CUJ) nationalFunds
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
size.info 5485 texts
size.info 1748858 tokens
size.info 1638229 words
files.count 2
files.size 22747429


 Files in this item

 Download all files in item (21.69 MB)
Icon
Name
Solar2.0.zip
Size
21 MB
Format
application/zip
Description
Corpus in XML format
MD5
2080f7b29a5932d57bf59ea9c7e056d8
 Download file  Preview
 File Preview  
Icon
Name
Smernice za označevanje korpusa Šolar 2.0 (v1.0).pdf
Size
714.09 KB
Format
PDF
Description
Guidelines for corpus annotation
MD5
df44421fe80ec4efad1e8741fd5905e1
 Download file

Show simple item record