| dc.contributor.author | Verdonik, Darinka |
| dc.contributor.author | Rupnik, Peter |
| dc.contributor.author | Vidinić, Jasna |
| dc.contributor.author | Ljubešić, Nikola |
| dc.date.accessioned | 2025-12-04T10:13:05Z |
| dc.date.available | 2025-12-04T10:13:05Z |
| dc.date.issued | 2025-12-02 |
| dc.identifier.uri | http://hdl.handle.net/11356/2073 |
| dc.description | Corpus of spoken Slovenian ROG-Dialog consists of volunteered audio, recorded by students by asking their relatives or acquaintances to talk on record in their homes. The speakers were directed to use various styles of dialogue, including instructions, interviews, discussions, story telling, and chatting. Dialogue themes were freely chosen, most prevalent themes include travelling, health, childhood memories, work, technology, food, and entertainment. Recordings and metadata were uploaded to the Govorjena Slovenščina web portal (https://govorjena-slovenscina.um.si/), manually segmented and transcribed in both colloquial and standardized orthographic transcriptions, and annotated with dialogue acts and sentiment. The 25 speakers in this corpus cover all statistical regions of Slovenia with their ages ranging from 21 to 82 years. The corpus includes speakers from both rural and urban areas. Reflecting this geographic and social diversity, speech samples range from standard colloquial registers to local dialects, with some speakers employing distinct regional varieties. ROG-Dialog is distributed as: - EXMARaLDA format (.EXB files) for viewing with Partitur Editor (https://www.exmaralda.org/) - .EXS files and Rog-Art.coma file for searching through the annotated corpus in the EXMARaLDA EXAKT concordancer (https://www.exmaralda.org/) - .TRS files for viewing the transcriptions without annotations with Transcriber (https://trans.sourceforge.net/en/presentation.php) - .TXT plain-text files ROG-dialog data were compiled to complement the ROG-Artur subcorpus of the ROG 1.0 training corpus of spoken Slovenian (http://hdl.handle.net/11356/1992). However, the two corpora differ in their annotation levels, and harmonising these remains a task for future merging. |
| dc.language.iso | slv |
| dc.publisher | Faculty of Electrical Engineering and Computer Science, University of Maribor |
| dc.publisher | Jožef Stefan Institute |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://govorjena-slovenscina.um.si/ |
| dc.subject | speech transcription |
| dc.subject | speech recordings |
| dc.subject | dialogue act |
| dc.subject | spoken corpus |
| dc.subject | spoken language |
| dc.subject | sentiment classification |
| dc.title | Corpus of spoken Slovenian ROG-Dialog 1.0 |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | audio |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Peter Rupnik peter.rupnik@ijs.si Jožef Stefan Institute |
| contact.person | Darinka Verdonik darinka.verdonik@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor |
| sponsor | ARIS (Slovenian Research and Innovation Agency) GC-0002 LLM4DH: Large Language Models for Digital Humanities nationalFunds |
| sponsor | ARRS (Slovenian Research Agency) P2-0069 Advanced methods of interaction in telecommuncations nationalFunds |
| size.info | 50486 words |
| size.info | 5.167 hours |
| files.count | 2 |
| files.size | 1310821223 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- ROG-Dialog.zip
- Size
- 4.65 MB
- Format
- application/zip
- Description
- ROG Dialog transcriptions, annotations, and metadata
- MD5
- 1dffd3e9703b0389ed4cb8deeb19223e
- ROG-Dialog
- DATA
- EXS
- ROG-Dia-GSO-P0022_s.exs1 MB
- ROG-Dia-GSO-P0005_s.exs1 MB
- ROG-Dia-GSO-P0007_s.exs2 MB
- ROG-Dia-GSO-P0019_s.exs1 MB
- ROG-Dia-GSO-P0009_s.exs1 MB
- ROG-Dia-GSO-P0016_s.exs1 MB
- ROG-Dia-GSO-P0012_s.exs1 MB
- ROG-Dia-GSO-P0018_s.exs1 MB
- ROG-Dia-GSO-P0021_s.exs1 MB
- ROG-Dia-GSO-P0008_s.exs1 MB
- ROG-Dia-GSO-P0011_s.exs1 MB
- ROG-Dia-GSO-P0025_s.exs1 MB
- ROG-Dia.coma65 kB
- TXT
- ROG-Dia-GSO-P0009-pog.txt33 kB
- ROG-Dia-GSO-P0007-pog.txt46 kB
- ROG-Dia-GSO-P0016-std.txt25 kB
- ROG-Dia-GSO-P0005-pog.txt30 kB
- ROG-Dia-GSO-P0022-std.txt34 kB
- ROG-Dia-GSO-P0011-pog.txt20 kB
- ROG-Dia-GSO-P0008-std.txt23 kB
- ROG-Dia-GSO-P0012-std.txt23 kB
- ROG-Dia-GSO-P0007-std.txt48 kB
- ROG-Dia-GSO-P0005-std.txt30 kB
- ROG-Dia-GSO-P0011-std.txt20 kB
- ROG-Dia-GSO-P0019-pog.txt29 kB
- ROG-Dia-GSO-P0021-pog.txt34 kB
- ROG-Dia-GSO-P0018-std.txt34 kB
- ROG-Dia-GSO-P0008-pog.txt22 kB
- ROG-Dia-GSO-P0012-pog.txt23 kB
- ROG-Dia-GSO-P0021-std.txt35 kB
- ROG-Dia-GSO-P0009-std.txt34 kB
- ROG-Dia-GSO-P0025-pog.txt28 kB
- ROG-Dia-GSO-P0018-pog.txt33 kB
- ROG-Dia-GSO-P0022-pog.txt33 kB
- ROG-Dia-GSO-P0016-pog.txt25 kB
- ROG-Dia-GSO-P0025-std.txt29 kB
- ROG-Dia-GSO-P0019-std.txt30 kB
- TRS
- ROG-Dia-GSO-P0007-pog.trs131 kB
- ROG-Dia-GSO-P0016-std.trs54 kB
- ROG-Dia-GSO-P0005-pog.trs77 kB
- ROG-Dia-GSO-P0022-std.trs79 kB
- ROG-Dia-GSO-P0011-pog.trs40 kB
- ROG-Dia-GSO-P0008-std.trs56 kB
- ROG-Dia-GSO-P0012-std.trs58 kB
- ROG-Dia-GSO-P0007-std.trs134 kB
- ROG-Dia-GSO-P0005-std.trs78 kB
- ROG-Dia-GSO-P0011-std.trs41 kB
- ROG-Dia-GSO-P0019-pog.trs54 kB
- ROG-Dia-GSO-P0021-pog.trs75 kB
- ROG-Dia-GSO-P0018-std.trs91 kB
- ROG-Dia-GSO-P0008-pog.trs55 kB
- ROG-Dia-GSO-P0012-pog.trs57 kB
- ROG-Dia-GSO-P0021-std.trs77 kB
- ROG-Dia-GSO-P0009-std.trs81 kB
- ROG-Dia-GSO-P0025-pog.trs63 kB
- ROG-Dia-GSO-P0018-pog.trs89 kB
- ROG-Dia-GSO-P0016-pog.trs53 kB
- ROG-Dia-GSO-P0022-pog.trs78 kB
- ROG-Dia-GSO-P0025-std.trs64 kB
- ROG-Dia-GSO-P0019-std.trs55 kB
- ROG-Dia-GSO-P0009-pog.trs80 kB
- EXB
- ROG-Dia-GSO-P0012.exb476 kB
- ROG-Dia-GSO-P0025.exb522 kB
- ROG-Dia-GSO-P0011.exb406 kB
- ROG-Dia-GSO-P0009.exb638 kB
- ROG-Dia-GSO-P0008.exb404 kB
- ROG-Dia-GSO-P0007.exb913 kB
- ROG-Dia-GSO-P0022.exb580 kB
- ROG-Dia-GSO-P0021.exb681 kB
- ROG-Dia-GSO-P0019.exb513 kB
- ROG-Dia-GSO-P0005.exb672 kB
- ROG-Dia-GSO-P0018.exb592 kB
- ROG-Dia-GSO-P0016.exb428 kB
- EXS
- PREBERIME.md3 kB
- README.md3 kB
- DOCS
- ROG-Dia-DOC.pdf101 kB
- ROG-Dia-meta-speakers.tsv6 kB
- ROG-Dia-meta-speeches.tsv5 kB
- ROG-Dia-TrainDevTest-split.tsv298 B
- DATA
- Name
- ROG-Dialog_audio.zip
- Size
- 1.22 GB
- Format
- application/zip
- Description
- ROG Dialog audio files
- MD5
- 3a63f5a83e9ba1ec280890d3467473f5
- ROG-Dialog
- DATA
- WAV
- ROG-Dia-GSO-P0012.wav104 MB
- ROG-Dia-GSO-P0025.wav170 MB
- ROG-Dia-GSO-P0009.wav135 MB
- ROG-Dia-GSO-P0011.wav97 MB
- ROG-Dia-GSO-P0008.wav116 MB
- ROG-Dia-GSO-P0007.wav149 MB
- ROG-Dia-GSO-P0022.wav121 MB
- ROG-Dia-GSO-P0021.wav168 MB
- ROG-Dia-GSO-P0019.wav169 MB
- ROG-Dia-GSO-P0005.wav100 MB
- ROG-Dia-GSO-P0018.wav104 MB
- ROG-Dia-GSO-P0016.wav170 MB
- WAV
- DATA