<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href='static/style.xsl' type='text/xsl'?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-04T07:41:59Z</responseDate><request verb="GetRecord" identifier="oai:www.clarin.si:11356/2062" metadataPrefix="oai_dc">http://www.clarin.si/repository/oai/request</request><GetRecord><record><header><identifier>oai:www.clarin.si:11356/2062</identifier><datestamp>2026-03-04T12:48:26Z</datestamp><setSpec>hdl_11356_1023</setSpec><setSpec>hdl_11356_1024</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Training corpus of spoken Slovenian ROG 1.1</dc:title>
<dc:creator>Verdonik, Darinka</dc:creator>
<dc:creator>Dobrovoljc, Kaja</dc:creator>
<dc:creator>Rupnik, Peter</dc:creator>
<dc:creator>Ljubešić, Nikola</dc:creator>
<dc:creator>Majhenič, Simona</dc:creator>
<dc:creator>Čibej, Jaka</dc:creator>
<dc:creator>Schmidt, Thomas</dc:creator>
<dc:creator>Vidinić, Jasna</dc:creator>
<dc:subject>speech transcription</dc:subject>
<dc:subject>speech recordings</dc:subject>
<dc:subject>universal dependencies</dc:subject>
<dc:subject>syntax</dc:subject>
<dc:subject>disfluencies</dc:subject>
<dc:subject>prosody</dc:subject>
<dc:subject>dialogue act</dc:subject>
<dc:subject>spoken corpus</dc:subject>
<dc:description>Training corpus of spoken Slovenian ROG 1.1 is an improved version of the ROG 1.0 corpus (http://hdl.handle.net/11356/1992). The main differences between the original and the current version are:&#xd;
- Manually corrected Prosodic Unit annotations in ROG-Art&#xd;
- Release of ROG-Art in ISO TEI format&#xd;
- Omission of TextGrid files&#xd;
&#xd;
The current version preserves the extent of the data and its composition:&#xd;
&#xd;
1. ROG-SST, which includes selected Gos 2.1 (http://hdl.handle.net/11356/1863) transcriptions with: &#xd;
- manually assigned lemmas and morphosyntactic tags according to the MULTEXT-East annotation scheme (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), &#xd;
- manual annotations according to the Universal Dependencies annotation scheme (i.e. part-of-speech categories, morphological features and syntactic dependencies)&#xd;
&#xd;
In total, ROG-SST spans 76341 words and 6108 sentences. ROG-SST is distributed as CONLL-U format (2014-2024) (.conllu files). Project website:  https://spot.ff.uni-lj.si/en/.&#xd;
&#xd;
2. ROG-Art, which includes: &#xd;
- all the annotation layers from the ROG-SST &#xd;
- prosodic units annotations &#xd;
- disfluencies annotation &#xd;
- dialogue acts annotation&#xd;
&#xd;
ROG-Art is distributed as:&#xd;
- EXMARaLDA format (.EXB files)  for viewing with Partitur Editor (https://www.exmaralda.org/)&#xd;
- .EXS files and Rog-Art.coma file for searching through the annotated corpus in the EXMARaLDA EXAKT concordancer (https://www.exmaralda.org/)&#xd;
- .TRS files for viewing the transcriptions without annotations with Transcriber (https://trans.sourceforge.net/en/presentation.php)&#xd;
- ISO TEI files for cross-platform compatibility.&#xd;
&#xd;
ROG-Art consists of 39001 words in 1969 sentences. WAV files are only available for the ROG-Art part. They must be copied to the WAV folder of the ROG-Art folder structure to enable automatic opening of WAV files in EXMARaLDA or Transcriber tools. WAV recording are single channel, sampled with 44100 Hz, with 16 bit precision.</dc:description>
<dc:date>2026-03-04</dc:date>
<dc:type>corpus</dc:type>
<dc:identifier>http://hdl.handle.net/11356/2062</dc:identifier>
<dc:language>slv</dc:language>
<dc:relation>https://doi.org/10.5281/zenodo.13936426</dc:relation>
<dc:relation>http://hdl.handle.net/11356/1992</dc:relation>
<dc:rights>Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
<dc:rights>PUB</dc:rights>
<dc:format>application/zip</dc:format>
<dc:format>application/zip</dc:format>
<dc:format>text/plain; charset=utf-8</dc:format>
<dc:format>downloadable_files_count: 2</dc:format>
<dc:publisher>Faculty of Electrical Engineering and Computer Science, University of Maribor</dc:publisher>
<dc:publisher>Jožef Stefan Institute</dc:publisher>
<dc:publisher>Faculty of Arts, University of Ljubljana</dc:publisher>
<dc:source>https://mezzanine.um.si/en/mezzanine-english/</dc:source>
</oai_dc:dc>
</metadata></record></GetRecord></OAI-PMH>