dc.contributor.author | Kosem, Iztok |
dc.contributor.author | Arhar Holdt, Špela |
dc.contributor.author | Stritar Kučuk, Mojca |
dc.contributor.author | Krek, Simon |
dc.contributor.author | Krapš Vodopivec, Irena |
dc.contributor.author | Stabej, Marko |
dc.contributor.author | Kocjančič, Polonca |
dc.contributor.author | Laskowski, Cyprian |
dc.contributor.author | Klemenc, Bojan |
dc.contributor.author | Pori, Eva |
dc.contributor.author | Rozman, Tadeja |
dc.date.accessioned | 2019-11-08T07:58:49Z |
dc.date.available | 2019-11-08T07:58:49Z |
dc.date.issued | 2019-07-08 |
dc.identifier.uri | http://hdl.handle.net/11356/1219 |
dc.description | Šolar 2.0 Clear is an adapted version of the Šolar 2.0 corpus, cf. http://hdl.handle.net/11356/1214. The Šolar 2.0 Clear corpus consists of texts written by students in Slovene primary and secondary schools. School essays form the majority of the corpus while other material includes texts created during lessons, such as text recapitulations or descriptions, examples of formal applications etc. For each text, the information on school (elementary or secondary), subject, level (grade or year), type of text, region and date of production is provided. Unlike the original Šolar 2.0 corpus (http://hdl.handle.net/11356/1214), Šolar 2.0 Clear includes student texts only: error annotations and other types of feedback from the teachers have been removed. The corpus can thus be used for processing tasks where the inclusion of corrections hinders or complicates the procedures (e.g. for comparative data extraction, training of language models etc). |
dc.language.iso | slv |
dc.publisher | Trojina, Institute for Applied Slovene Studies |
dc.publisher | Centre for Language Resources and Technologies, University of Ljubljana |
dc.relation.replaces | http://hdl.handle.net/11356/1150 |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1589 |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.cjvt.si/raziskovalno-delo/projekti-cjvt/korpus-solar/ |
dc.subject | student writing |
dc.subject | developmental corpus |
dc.title | Developmental corpus (without language corrections) Šolar 2.0 Clear |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Iztok Kosem iztok.kosem@trojina.si Trojina, Institute for Applied Slovene Studies |
sponsor | ARRS (Slovenian Research Agency) I0-0051 Centre for Applied Linguistics (CUJ) nationalFunds |
sponsor | Ministry of Culture 3340-15-141006 Upgrade of Šolar Corpus nationalFunds |
sponsor | University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds |
size.info | 5485 texts |
size.info | 1638229 words |
size.info | 1907731 tokens |
files.count | 2 |
files.size | 30636315 |
Files in this item
Download all files in item (29.22 MB)This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)





- Name
- Solar2.0-Clear.zip
- Size
- 20.01 MB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- d64dcf5c3ddbb851771f435a5d2af58a
- Solar2.0-Clear
- solar2-clear.xml156 MB
- schema
- tei_clarin_schema.xml3 kB
- tei_clarin.rnc305 kB
- tei_clarin.dtd239 kB
- tei_clarin.sch496 B
- tei_clarin.xsd667 kB
- tei_clarin.rng612 kB
- dcr.tmp1 kB
- 00README.txt237 B

- Name
- Solar2.0-Clear.vert.zip
- Size
- 9.21 MB
- Format
- application/zip
- Description
- Corpus in derived vertical (Sketch Engine / CQP) format
- MD5
- 025edfded5d2e17c58697ea5a55d7d09
- Solar2.0-Clear.vert
- solar.vert49 MB
- solar.regi3 kB
- 00README.txt237 B