dc.contributor.author | Verdonik, Darinka |
dc.contributor.author | Bizjak, Andreja |
dc.contributor.author | Žgank, Andrej |
dc.contributor.author | Bernjak, Mitja |
dc.contributor.author | Antloga, Špela |
dc.contributor.author | Majhenič, Simona |
dc.contributor.author | Čakš, Peter |
dc.contributor.author | Pucer, Matevž |
dc.contributor.author | Cvetko, Mitja |
dc.contributor.author | Zelenik, Marijana |
dc.contributor.author | Pavlič, Jani |
dc.contributor.author | Dobrišek, Simon |
dc.contributor.author | Križaj, Janez |
dc.contributor.author | Strle, Gregor |
dc.contributor.author | Ivanovska, Marija |
dc.contributor.author | Grm, Klemen |
dc.contributor.author | Bajec, Marko |
dc.contributor.author | Lebar Bajec, Iztok |
dc.contributor.author | Jelovšek, Tjaša |
dc.contributor.author | Lokovšek, Jure |
dc.contributor.author | Longyka, Jure |
dc.contributor.author | Trojar, Mitja |
dc.contributor.author | Žganec Gros, Jerneja |
dc.contributor.author | Mihelič, Aleš |
dc.contributor.author | Vesnicer, Boštjan |
dc.contributor.author | Dretnik, Naum |
dc.contributor.author | Bordon, David |
dc.date.accessioned | 2023-03-06T10:10:46Z |
dc.date.available | 2023-03-06T10:10:46Z |
dc.date.issued | 2023-02-27 |
dc.identifier.uri | http://hdl.handle.net/11356/1776 |
dc.description | Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are transcribed, while the remaining 183 hours are recordings only. This repository entry includes audio files only, the transcriptions are available on http://hdl.handle.net/11356/1772. The data are structured as follows: (1) Artur-B, read speech, 573 hours in total. It includes: (1a) Artur-B-Brani, 485 hours: Readings of sentences which were pre-selected from a 10% increment in the Gigafida 2.0 corpus. The sentences were chosen in such a way that they reflect the natural or the actual distribution of triphones in the words. They were distributed between 1,000 speakers, so that we recorded approx. 30 min in read form from each speaker. The speakers were balanced according to gender, age, region, and a small proportion of speakers were non-native speakers of Slovene. Each sentence is its own audio file and has a corresponding transcription file. (1b) Artur-B-Crkovani, 10 hours: Spellings. Speakers were asked to spell abbreviations and personal names and surnames, all chosen so that all Slovene letters were covered, plus the most common foreign letters. (1c) Artur-B-Studio, 51 hours: Designed for the development of speech synthesis. The sentences were read in a studio by a single speaker. Each sentence is its own audio file and has a corresponding transcription file. (1d) Artur-B-Izloceno, 27 hours: The recordings include different types of errors, typically, incorrect reading of sentences or a noisy environment. (2) Artur-J, public speech, 62 hours in total. It includes: (2a) Artur-J-Splosni, 62 hours: media recordings, online recordings of conferences, workshops, education videos, etc. (3) Artur-N, private speech, 74 hours in total. It includes: (3a) Artur-N-Obrazi, 6 hours: Speakers were asked to describe faces on pictures. Designed for a face-description domain-specific speech recognition. (3b) Artur-N-PDom, 7 hours: Speakers were asked to read pre-written sentences, as well as to express instructions for a potential smart-home system freely. Designed for a smart-home domain-specific speech recognition. (3c) Artur-N-Prosti, 61 hours: Monologues and dialogues between two persons, recorded for the purposes of the Artur database creation. Speakers were asked to conversate or explain freely on casual topics. (4) Artur-P, parliamentary speech, 201 hours in total. It includes: (4a) Artur-P-SejeDZ, 201 hours: Speech from the Slovene National Assembly. Further information on the database are available in the Artur-DOC file, which is part of this repository entry. |
dc.language.iso | slv |
dc.publisher | Faculty of Electrical Engineering and Computer Science, University of Maribor |
dc.publisher | Faculty of Electrical Engineering, University of Ljubljana |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.publisher | Alpineon d.o.o. |
dc.publisher | STA |
dc.relation.replaces | http://hdl.handle.net/11356/1717 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://rsdo.slovenscina.eu/en/speech-technologies |
dc.subject | speech database |
dc.subject | automatic speech recognition |
dc.subject | spoken language |
dc.subject | spoken corpus |
dc.title | ASR database ARTUR 1.0 (audio) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | audio |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Darinka Verdonik darinka.verdonik@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor |
contact.person | Simon Dobrišek simon.dobrisek@fe.uni-lj.si Faculty of Electrical Engineering, University of Ljubljana |
contact.person | Marko Bajec marko.bajec@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
contact.person | Jerneja Žganec Gros jerneja.gros@alpineon.si Alpineon d.o.o. |
contact.person | Naum Dretnik nd@sta.si STA |
sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
size.info | 1067 hours |
files.count | 39 |
files.size | 348463080641 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- 00Readme.txt
- Size
- 5.06 KB
- Format
- Text file
- Description
- An overview of the Artur corpus directory structure in English
- MD5
- 9614f5c5a2e8fca7c2134e4c09065f5c
00Readme.txt - this file Artur-DOC - Artur database documentation folder Artur-Opis.pdf - description of the Artur speech-database in Slovenian Artur-Description.pdf - description of the Artur speech-database in English Artur-Oznake.pdf - description of tags used in the Artur database Artur-PogovorniZapis.pdf - specification of pronunciation-based transcription Artur-StandardiziraniZapis.pdf - specification of standardised transcription Artur-B - read speech documentation folder Artur-IzborPovedi.pdf - description of sentence selection for read speech Artur-J - public speech documentation folder Artur-N - non-public speech documentation folder Artur-P - parliamentary speech documentation folder Artur-TRS - trs-format transcription folder Artur-B - read speech files folder 00Artur-B-Govorci.tsv - read speech speakers data file in Slovenian 00Artur-B-Posnetki.tsv - read speech recordings data file in Slovenian 00Artur-B-Speake . . .
- Name
- 00Preberime.txt
- Size
- 5.21 KB
- Format
- Text file
- Description
- An overview of the Artur corpus directory structure in Slovenian
- MD5
- 270c89460cafea628e0cc7bc0a2d9c5e
00Preberime.txt - ta datoteka Artur-DOC - mapa z dokumentacijo o bazi Artur Artur-Opis.pdf - opis celotne govorne zbirke Artur Artur-Description.pdf - opis celotne govorne zbirke Artur v angleškem jeziku Artur-Oznake.pdf - informacija o vseh oznakah, ki so uporabljene v govorni zbirki Artur Artur-StandardiziraniZapis.pdf - informacija o standardnem zapisu in označevanju govora v govorni zbirki Artur Artur-PogovorniZapis.pdf - informacija o pogovornem zapisu in označevanju govora v govorni zbirki Artur Artur-B - mapa z dokumentacijo o branem govoru Artur-IzborPovedi.pdf - informacija o načinu izbire povedi za brani govor Artur-J - mapa z dokumentacijo o javnem govoru Artur-N - mapa z dokumentacijo o nejavnem govoru Artur-P - mapa z dokumentacijo o parlamentarnem govoru Artur-TRS - mapa s transkripcijami v trs-formatu Artur-B - mapa z datotekami branega govora 00Artur-B-Govorci.tsv - datoteka s podatki o govorcih branega govora 00A . . .
- Name
- Artur_1.0_DOC.tgz
- Size
- 3.6 MB
- Format
- Unknown
- Description
- The GZIP TAR archive of the Artur corpus documentation files
- MD5
- 4045968d55f01c2b8aca61da844a480e
- Name
- Artur-B-Audio_00.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 0
- MD5
- 199738f314b1fe3e1d96f221020caf4e
- Name
- Artur-B-Audio_01.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 1
- MD5
- bb44bbd6ec508837d673b7d2d27073ed
- Name
- Artur-B-Audio_02.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 2
- MD5
- 0c215a3b7577bfca2a39206a0ecd3ee5
- Name
- Artur-B-Audio_03.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 3
- MD5
- 7063c632e41e0f1ce9525c6edb61bc73
- Name
- Artur-B-Audio_04.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 4
- MD5
- 98d602e339ebb66c7edc302c9d2c4b57
- Name
- Artur-B-Audio_05.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 5
- MD5
- e219bc1b32c5937cb0ca19ff357fb595
- Name
- Artur-B-Audio_06.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 6
- MD5
- defab40077a4cb82074faa575d69358b
- Name
- Artur-B-Audio_07.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 7
- MD5
- 2aec1524c539c779933e1ebb0800c000
- Name
- Artur-B-Audio_08.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 8
- MD5
- 6009c0b22a53199f7d4db7c84b223582
- Name
- Artur-B-Audio_09.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 9
- MD5
- 0edac95c5d62c309e10833358b3ee105
- Name
- Artur-B-Audio_10.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 10
- MD5
- 7360ca2dfd00a59692afc602029b414c
- Name
- Artur-B-Audio_11.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 11
- MD5
- fea55dff462348ae6cb4f719d7cfb6e3
- Name
- Artur-B-Audio_12.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 12
- MD5
- 26b280c2ff578a4757cd4463c335415b
- Name
- Artur-B-Audio_13.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 13
- MD5
- 6349791a6a0200129942e1737372fe22
- Name
- Artur-B-Audio_14.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 14
- MD5
- a136e410827ded92c374d16f7be567b0
- Name
- Artur-B-Audio_15.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 15
- MD5
- 9eb284e6231959cc6c0cca87ea8f15db
- Name
- Artur-B-Audio_16.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 16
- MD5
- 52abde6e78abb64e8f31e40d9dda28f3
- Name
- Artur-B-Audio_17.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 17
- MD5
- 5360204e426e5607c6ef579d0086008d
- Name
- Artur-B-Audio_18.tar
- Size
- 8.97 GB
- Format
- Unknown
- Description
- Audio files, part B, tar 18
- MD5
- 3ac6ee1b12678eb8d55a60590e30aff0
- Name
- Artur-J-Audio_00.tar
- Size
- 8.81 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 0
- MD5
- bc8b4e0625fce2b47d99ed7da8db7393
- Name
- Artur-J-Audio_01.tar
- Size
- 8.96 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 1
- MD5
- 6e4e6684a424d8efeefe1c891536899d
- Name
- Artur-J-Audio_02.tar
- Size
- 10.19 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 2
- MD5
- cbb2f55cfe5c700864e0662cd78d68ee
- Name
- Artur-J-Audio_03.tar
- Size
- 8.99 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 3
- MD5
- 061dbca9e31490fd51b5f15aa722e003
- Name
- Artur-J-Audio_04.tar
- Size
- 8.91 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 4
- MD5
- 0c5706d91653b7a8565c8759c30991a0
- Name
- Artur-J-Audio_05.tar
- Size
- 8.83 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 5
- MD5
- 01bf0bd477e8e6b93f9b19bf3bed7c08
- Name
- Artur-J-Audio_06.tar
- Size
- 6.81 GB
- Format
- Unknown
- Description
- Audio files, part J, tar 6
- MD5
- 26196913a55cd011422e2db0cf2a2836
- Name
- Artur-N-Audio_00.tar
- Size
- 8.26 GB
- Format
- Unknown
- Description
- Audio files, part N, tar 0
- MD5
- 49b9ed5ba46db4b5e784350580b3cc04
- Name
- Artur-N-Audio_01.tar
- Size
- 8.35 GB
- Format
- Unknown
- Description
- Audio files, part N, tar 1
- MD5
- eae257adf67dba3a981fb85cd8332a4c
- Name
- Artur-N-Audio_02.tar
- Size
- 8.31 GB
- Format
- Unknown
- Description
- Audio files, part N, tar 2
- MD5
- 1bb3e409f81d808d9c2c4a7333e497ea
- Name
- Artur-N-Audio_03.tar
- Size
- 8.09 GB
- Format
- Unknown
- Description
- Audio files, part N, tar 3
- MD5
- 924b3d8c2b07e95ba56c7addcebab59f
- Name
- Artur-P-Audio_00.tar
- Size
- 10 GB
- Format
- Unknown
- Description
- Audio files, part P, tar 0
- MD5
- 75951a093d83d4bb17e1a9b3458be586
- Name
- Artur-P-Audio_01.tar
- Size
- 9.96 GB
- Format
- Unknown
- Description
- Audio files, part P, tar 1
- MD5
- e267e5a0ca1f8a1d4b1476288fec12df
- Name
- Artur-P-Audio_02.tar
- Size
- 9.94 GB
- Format
- Unknown
- Description
- Audio files, part P, tar 2
- MD5
- acecaea4c2d76b03842a9b6457600473
- Name
- Artur-P-Audio_03.tar
- Size
- 9.94 GB
- Format
- Unknown
- Description
- Audio files, part P, tar 3
- MD5
- 40e5aa3ee79d247e6a9ad6a3f0baa3dc
- Name
- Artur-P-Audio_04.tar
- Size
- 9.93 GB
- Format
- Unknown
- Description
- Audio files, part P, tar 4
- MD5
- e3bc5137b57e484b061ed472b33b6e89
- Name
- Artur-P-Audio_05.tar
- Size
- 9.79 GB
- Format
- Unknown
- Description
- Audio files, part P, tar 5
- MD5
- 5416dacba46615d74bdb315628bb8570