Prikaži enostavni zapis vnosa

 
dc.contributor.author Stritar Kučuk, Mojca
dc.contributor.author Šter, Helena
dc.contributor.author Pisek, Staša
dc.contributor.author Petric Lasnik, Ivana
dc.contributor.author Kete Matičič, Jana
dc.contributor.author Pirih Svetina, Nataša
dc.contributor.author Preglau, Daniela
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Krsnik, Luka
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Pegan, Jasmina
dc.contributor.author Huber, Damjan
dc.date.accessioned 2025-11-22T14:16:18Z
dc.date.available 2025-11-22T14:16:18Z
dc.date.issued 2025-11-18
dc.identifier.uri http://hdl.handle.net/11356/2066
dc.description The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 10,590 texts (almost 1.4 million words) written by adult speakers for whom Slovene is not their first language. This corpus offers insights into Slovene language as produced by those who are still learning it as a second or foreign language, and in particular into the most common errors that occur in this process. KOST therefore aims at all those working with Slovene as a second or foreign language. The texts were mainly written at lectorates and Slovene as a L2/FL courses. Most of the authors of these texts speak Serbian, Bosnian and Macedonian as their first language, but texts by speakers of other languages are also included. The authors are at different proficiency levels in Slovene, from beginners to advanced. For each contributor, information is available on gender, year of birth, country, first language and other languages they speak, employment status and education, and prior experience of learning Slovene. For each text, there is also information on the time and circumstances of creation (exam or homework), the programme in which it was produced, input type (digital or hand-written), language level and the grade. A part of the corpus has also texts available in their corrected version. The tokens of the original and corrected texts are linked (one group of links per paragraph) and the links categorised into 23 error types. The corpus is availabe in two formats: (1) TEI encoding of the complete corpus (texts, links), including contributor and text metadata in the TEI header, and (2) the corpus in the original and corrected variants as vertical and corresponding registry files, suitable for mounting on CQP-type concordancers. Note that the vertical format does not retain the connection between the original and corrected tokens.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://centerslo.si/wp-content/uploads/2022/11/Stritar-Kucuk_Obdobja-41.pdf
dc.relation.isreferencedby https://centerslo.si/wp-content/uploads/2022/11/Arhar-Holdt-et-al_Obdobja-41.pdf
dc.relation.replaces http://hdl.handle.net/11356/1887
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://viri.cjvt.si/kost/
dc.subject learner corpus
dc.subject second language
dc.subject foreign language
dc.title Slovene learner corpus KOST 2.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Mojca Stritar Kučuk mojca.stritarkucuk@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
sponsor Ministry of Culture - Upgrade of KOST and KUUS corpora for Slovenian as second of foreign language nationalFunds
sponsor Ministrstvo za visoko šolstvo, znanost in inovacije Povečanje korpusa slovenščine kot tujega jezika KOST 2.0 v KOST 2.1 Povečanje korpusa slovenščine kot tujega jezika KOST 2.0 v KOST 2.1 nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 10590 texts
size.info 1376188 words
files.count 2
files.size 175835014
featuredService.kontext search original|https://www.clarin.si/kontext/query?corpname=kost21_orig
featuredService.kontext search corrected|https://www.clarin.si/kontext/query?corpname=kost21_corr
featuredService.noske search original|https://www.clarin.si/ske/#dashboard?corpname=kost21_orig
featuredService.noske search corrected|https://www.clarin.si/ske/#dashboard?corpname=kost21_corr


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (167.69 MB)
Icon
Ime
KOST.TEI.zip
Velikost
79.31 MB
Format
application/zip
Opis
Corpus in TEI format
MD5
c960a957913c0a251cbd8ff60eb87c82
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • KOST.TEI
    • kost-orig.xml365 MB
    • schema
      • tei_clarin_example.xml48 kB
      • tei_clarin.rnc331 kB
      • tei_clarin_schema.xml70 kB
      • README.md525 B
      • tei_clarin.rng707 kB
    • kost-corr.xml366 MB
    • kost-errs.xml106 MB
    • kost.xml34 kB
    • mte-msd.xml2 MB
    • 00README.txt763 B
Icon
Ime
KOST.vert.zip
Velikost
88.38 MB
Format
application/zip
Opis
Corpus in derived vertical format
MD5
c61fb2b7072ace545bce3fa76c5c3071
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • KOST.vert
    • kost21_orig.vert225 MB
    • kost20_corr.vert212 MB
    • kost21_corr.regi5 kB
    • kost20_orig.vert211 MB
    • kost21_orig.regi5 kB
    • kost20_corr.regi4 kB
    • 00README.txt586 B
    • kost20_orig.regi4 kB
    • kost21_corr.vert226 MB

Prikaži enostavni zapis vnosa