<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href='static/style.xsl' type='text/xsl'?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-21T22:52:22Z</responseDate><request verb="GetRecord" identifier="oai:www.clarin.si:11356/2052" metadataPrefix="oai_dc">http://www.clarin.si/repository/oai/request</request><GetRecord><record><header><identifier>oai:www.clarin.si:11356/2052</identifier><datestamp>2026-04-08T11:53:59Z</datestamp><setSpec>hdl_11356_1023</setSpec><setSpec>hdl_11356_1024</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Dataset of Authentic and Synthetic Slovene Language Errors DASSLE 1.0</dc:title>
<dc:creator>Arhar Holdt, Špela</dc:creator>
<dc:creator>Antloga, Špela</dc:creator>
<dc:creator>Gantar, Polona</dc:creator>
<dc:creator>Munda, Tina</dc:creator>
<dc:creator>Robida, Nejc</dc:creator>
<dc:creator>Zgonc, Matjaž</dc:creator>
<dc:subject>grammatical error correction</dc:subject>
<dc:subject>language problem</dc:subject>
<dc:subject>error annotation</dc:subject>
<dc:subject>evaluation</dc:subject>
<dc:description>DASSLE 1.0 (Dataset of Authentic and Synthetic Slovene Language Errors) comprises 7,385 manually prepared entries, each consisting of a Slovene sentence containing a single, specific language problem, its corrected version, and metadata including both coarse- and fine-grained correction classifications, as well as the source of the example.&#xd;
&#xd;
Language problems are divided into five top-level categories: spelling, orthography, morphology, vocabulary, and syntax. These are further specified using 128 fine-grained error types, aligned with the typology developed for the Šolar 3.0 corpus. The typology is explained at https://wiki.cjvt.si/books/11-developmental-corpus-solar/page/introduction-to-tags and in more detail in the annotation guidelines at https://wiki.cjvt.si/books/11-developmental-corpus-solar/page/annotation-guidelines.&#xd;
&#xd;
The examples in DASSLE 1.0 were sourced from four distinct origins, combining both authentic and synthetic data creation. From Šolar 3.0, the corpus of student writing with teacher-provided corrections, sentences were manually reviewed and edited to contain only one clearly defined language problem. In Gigafida 2.0, the reference corpus of standard written Slovene, examples were either manually corrected or deliberately corrupted to introduce typical deviations from the current norm. Synthetic examples were generated using GPT-4o, which was prompted with authentic sentence pairs; outputs were manually reviewed to select only those most closely resembling natural language use. A small number of examples were collected from Jezikovna svetovalnica, based on real language queries submitted by speakers.&#xd;
&#xd;
The dataset is primarily intended for the development and evaluation of natural language processing tools for automatic error detection and correction for Slovene. It is available in TSV format, accompanied by a README document that describes its contents in more detail.</dc:description>
<dc:date>2025-09-30</dc:date>
<dc:type>lexicalConceptualResource</dc:type>
<dc:identifier>http://hdl.handle.net/11356/2052</dc:identifier>
<dc:language>slv</dc:language>
<dc:rights>Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
<dc:rights>PUB</dc:rights>
<dc:format>text/plain; charset=utf-8</dc:format>
<dc:format>application/zip</dc:format>
<dc:format>downloadable_files_count: 1</dc:format>
<dc:publisher>Centre for Language Resources and Technologies, University of Ljubljana</dc:publisher>
<dc:source>https://www.cjvt.si/llm4dh/en/work-packages/work-package-2/#task-2.3</dc:source>
</oai_dc:dc>
</metadata></record></GetRecord></OAI-PMH>