Prikaži enostavni zapis vnosa

 
dc.contributor.author Verdonik, Darinka
dc.contributor.author Bizjak, Andreja
dc.contributor.author Žgank, Andrej
dc.contributor.author Bernjak, Mitja
dc.contributor.author Antloga, Špela
dc.contributor.author Majhenič, Simona
dc.contributor.author Čakš, Peter
dc.contributor.author Pucer, Matevž
dc.contributor.author Cvetko, Mitja
dc.contributor.author Zelenik, Marijana
dc.contributor.author Pavlič, Jani
dc.contributor.author Dobrišek, Simon
dc.contributor.author Križaj, Janez
dc.contributor.author Strle, Gregor
dc.contributor.author Ivanovska, Marija
dc.contributor.author Grm, Klemen
dc.contributor.author Bajec, Marko
dc.contributor.author Lebar Bajec, Iztok
dc.contributor.author Jelovšek, Tjaša
dc.contributor.author Lokovšek, Jure
dc.contributor.author Longyka, Jure
dc.contributor.author Trojar, Mitja
dc.contributor.author Žganec Gros, Jerneja
dc.contributor.author Mihelič, Aleš
dc.contributor.author Vesnicer, Boštjan
dc.contributor.author Dretnik, Naum
dc.contributor.author Bordon, David
dc.date.accessioned 2023-03-06T10:10:46Z
dc.date.available 2023-03-06T10:10:46Z
dc.date.issued 2023-02-27
dc.identifier.uri http://hdl.handle.net/11356/1776
dc.description Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are transcribed, while the remaining 183 hours are recordings only. This repository entry includes audio files only, the transcriptions are available on http://hdl.handle.net/11356/1772. The data are structured as follows: (1) Artur-B, read speech, 573 hours in total. It includes: (1a) Artur-B-Brani, 485 hours: Readings of sentences which were pre-selected from a 10% increment in the Gigafida 2.0 corpus. The sentences were chosen in such a way that they reflect the natural or the actual distribution of triphones in the words. They were distributed between 1,000 speakers, so that we recorded approx. 30 min in read form from each speaker. The speakers were balanced according to gender, age, region, and a small proportion of speakers were non-native speakers of Slovene. Each sentence is its own audio file and has a corresponding transcription file. (1b) Artur-B-Crkovani, 10 hours: Spellings. Speakers were asked to spell abbreviations and personal names and surnames, all chosen so that all Slovene letters were covered, plus the most common foreign letters. (1c) Artur-B-Studio, 51 hours: Designed for the development of speech synthesis. The sentences were read in a studio by a single speaker. Each sentence is its own audio file and has a corresponding transcription file. (1d) Artur-B-Izloceno, 27 hours: The recordings include different types of errors, typically, incorrect reading of sentences or a noisy environment. (2) Artur-J, public speech, 62 hours in total. It includes: (2a) Artur-J-Splosni, 62 hours: media recordings, online recordings of conferences, workshops, education videos, etc. (3) Artur-N, private speech, 74 hours in total. It includes: (3a) Artur-N-Obrazi, 6 hours: Speakers were asked to describe faces on pictures. Designed for a face-description domain-specific speech recognition. (3b) Artur-N-PDom, 7 hours: Speakers were asked to read pre-written sentences, as well as to express instructions for a potential smart-home system freely. Designed for a smart-home domain-specific speech recognition. (3c) Artur-N-Prosti, 61 hours: Monologues and dialogues between two persons, recorded for the purposes of the Artur database creation. Speakers were asked to conversate or explain freely on casual topics. (4) Artur-P, parliamentary speech, 201 hours in total. It includes: (4a) Artur-P-SejeDZ, 201 hours: Speech from the Slovene National Assembly. Further information on the database are available in the Artur-DOC file, which is part of this repository entry.
dc.language.iso slv
dc.publisher Faculty of Electrical Engineering and Computer Science, University of Maribor
dc.publisher Faculty of Electrical Engineering, University of Ljubljana
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.publisher Alpineon d.o.o.
dc.publisher STA
dc.relation.replaces http://hdl.handle.net/11356/1717
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/en/speech-technologies
dc.subject speech database
dc.subject automatic speech recognition
dc.subject spoken language
dc.subject spoken corpus
dc.title ASR database ARTUR 1.0 (audio)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
has.files yes
branding CLARIN.SI data & tools
contact.person Darinka Verdonik darinka.verdonik@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor
contact.person Simon Dobrišek simon.dobrisek@fe.uni-lj.si Faculty of Electrical Engineering, University of Ljubljana
contact.person Marko Bajec marko.bajec@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
contact.person Jerneja Žganec Gros jerneja.gros@alpineon.si Alpineon d.o.o.
contact.person Naum Dretnik nd@sta.si STA
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
size.info 1067 hours
files.count 39
files.size 348463080641


 Datoteke v tem vnosu

Icon
Ime
00Readme.txt
Velikost
5.06 KB
Format
Besedilna datoteka
Opis
An overview of the Artur corpus directory structure in English
MD5
9614f5c5a2e8fca7c2134e4c09065f5c
 Prenesi datoteko  Predogled
 Predogled datoteke  
00Readme.txt		- this file
Artur-DOC			- Artur database documentation folder
	Artur-Opis.pdf			- description of the Artur speech-database in Slovenian
	Artur-Description.pdf		- description of the Artur speech-database in English
	Artur-Oznake.pdf		- description of tags used in the Artur database
	Artur-PogovorniZapis.pdf	- specification of pronunciation-based transcription
	Artur-StandardiziraniZapis.pdf	- specification of standardised transcription 
	Artur-B				- read speech documentation folder
		Artur-IzborPovedi.pdf		- description of sentence selection for read speech
	Artur-J				- public speech documentation folder
	Artur-N				- non-public speech documentation folder
	Artur-P				- parliamentary speech documentation folder
Artur-TRS			- trs-format transcription folder
	Artur-B				- read speech files folder
		00Artur-B-Govorci.tsv		- read speech speakers data file in Slovenian
		00Artur-B-Posnetki.tsv		- read speech recordings data file in Slovenian
		00Artur-B-Speake . . .
                                            
Icon
Ime
00Preberime.txt
Velikost
5.21 KB
Format
Besedilna datoteka
Opis
An overview of the Artur corpus directory structure in Slovenian
MD5
270c89460cafea628e0cc7bc0a2d9c5e
 Prenesi datoteko  Predogled
 Predogled datoteke  
00Preberime.txt		- ta datoteka
Artur-DOC			- mapa z dokumentacijo o bazi Artur
	Artur-Opis.pdf			- opis celotne govorne zbirke Artur
	Artur-Description.pdf		- opis celotne govorne zbirke Artur v angleškem jeziku
	Artur-Oznake.pdf		- informacija o vseh oznakah, ki so uporabljene v govorni zbirki Artur
	Artur-StandardiziraniZapis.pdf	- informacija o standardnem zapisu in označevanju govora v govorni zbirki Artur
	Artur-PogovorniZapis.pdf	- informacija o pogovornem zapisu in označevanju govora v govorni zbirki Artur
	Artur-B				- mapa z dokumentacijo o branem govoru
		Artur-IzborPovedi.pdf		- informacija o načinu izbire povedi za brani govor
	Artur-J				- mapa z dokumentacijo o javnem govoru
	Artur-N				- mapa z dokumentacijo o nejavnem govoru
	Artur-P				- mapa z dokumentacijo o parlamentarnem govoru
Artur-TRS			- mapa s transkripcijami v trs-formatu
	Artur-B				- mapa z datotekami branega govora
		00Artur-B-Govorci.tsv		- datoteka s podatki o govorcih branega govora
		00A . . .
                                            
Icon
Ime
Artur_1.0_DOC.tgz
Velikost
3.6 MB
Format
Neznano
Opis
The GZIP TAR archive of the Artur corpus documentation files
MD5
4045968d55f01c2b8aca61da844a480e
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_00.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 0
MD5
199738f314b1fe3e1d96f221020caf4e
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_01.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 1
MD5
bb44bbd6ec508837d673b7d2d27073ed
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_02.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 2
MD5
0c215a3b7577bfca2a39206a0ecd3ee5
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_03.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 3
MD5
7063c632e41e0f1ce9525c6edb61bc73
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_04.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 4
MD5
98d602e339ebb66c7edc302c9d2c4b57
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_05.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 5
MD5
e219bc1b32c5937cb0ca19ff357fb595
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_06.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 6
MD5
defab40077a4cb82074faa575d69358b
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_07.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 7
MD5
2aec1524c539c779933e1ebb0800c000
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_08.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 8
MD5
6009c0b22a53199f7d4db7c84b223582
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_09.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 9
MD5
0edac95c5d62c309e10833358b3ee105
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_10.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 10
MD5
7360ca2dfd00a59692afc602029b414c
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_11.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 11
MD5
fea55dff462348ae6cb4f719d7cfb6e3
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_12.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 12
MD5
26b280c2ff578a4757cd4463c335415b
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_13.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 13
MD5
6349791a6a0200129942e1737372fe22
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_14.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 14
MD5
a136e410827ded92c374d16f7be567b0
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_15.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 15
MD5
9eb284e6231959cc6c0cca87ea8f15db
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_16.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 16
MD5
52abde6e78abb64e8f31e40d9dda28f3
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_17.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 17
MD5
5360204e426e5607c6ef579d0086008d
 Prenesi datoteko
Icon
Ime
Artur-B-Audio_18.tar
Velikost
8.97 GB
Format
Neznano
Opis
Audio files, part B, tar 18
MD5
3ac6ee1b12678eb8d55a60590e30aff0
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_00.tar
Velikost
8.81 GB
Format
Neznano
Opis
Audio files, part J, tar 0
MD5
bc8b4e0625fce2b47d99ed7da8db7393
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_01.tar
Velikost
8.96 GB
Format
Neznano
Opis
Audio files, part J, tar 1
MD5
6e4e6684a424d8efeefe1c891536899d
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_02.tar
Velikost
10.19 GB
Format
Neznano
Opis
Audio files, part J, tar 2
MD5
cbb2f55cfe5c700864e0662cd78d68ee
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_03.tar
Velikost
8.99 GB
Format
Neznano
Opis
Audio files, part J, tar 3
MD5
061dbca9e31490fd51b5f15aa722e003
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_04.tar
Velikost
8.91 GB
Format
Neznano
Opis
Audio files, part J, tar 4
MD5
0c5706d91653b7a8565c8759c30991a0
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_05.tar
Velikost
8.83 GB
Format
Neznano
Opis
Audio files, part J, tar 5
MD5
01bf0bd477e8e6b93f9b19bf3bed7c08
 Prenesi datoteko
Icon
Ime
Artur-J-Audio_06.tar
Velikost
6.81 GB
Format
Neznano
Opis
Audio files, part J, tar 6
MD5
26196913a55cd011422e2db0cf2a2836
 Prenesi datoteko
Icon
Ime
Artur-N-Audio_00.tar
Velikost
8.26 GB
Format
Neznano
Opis
Audio files, part N, tar 0
MD5
49b9ed5ba46db4b5e784350580b3cc04
 Prenesi datoteko
Icon
Ime
Artur-N-Audio_01.tar
Velikost
8.35 GB
Format
Neznano
Opis
Audio files, part N, tar 1
MD5
eae257adf67dba3a981fb85cd8332a4c
 Prenesi datoteko
Icon
Ime
Artur-N-Audio_02.tar
Velikost
8.31 GB
Format
Neznano
Opis
Audio files, part N, tar 2
MD5
1bb3e409f81d808d9c2c4a7333e497ea
 Prenesi datoteko
Icon
Ime
Artur-N-Audio_03.tar
Velikost
8.09 GB
Format
Neznano
Opis
Audio files, part N, tar 3
MD5
924b3d8c2b07e95ba56c7addcebab59f
 Prenesi datoteko
Icon
Ime
Artur-P-Audio_00.tar
Velikost
10 GB
Format
Neznano
Opis
Audio files, part P, tar 0
MD5
75951a093d83d4bb17e1a9b3458be586
 Prenesi datoteko
Icon
Ime
Artur-P-Audio_01.tar
Velikost
9.96 GB
Format
Neznano
Opis
Audio files, part P, tar 1
MD5
e267e5a0ca1f8a1d4b1476288fec12df
 Prenesi datoteko
Icon
Ime
Artur-P-Audio_02.tar
Velikost
9.94 GB
Format
Neznano
Opis
Audio files, part P, tar 2
MD5
acecaea4c2d76b03842a9b6457600473
 Prenesi datoteko
Icon
Ime
Artur-P-Audio_03.tar
Velikost
9.94 GB
Format
Neznano
Opis
Audio files, part P, tar 3
MD5
40e5aa3ee79d247e6a9ad6a3f0baa3dc
 Prenesi datoteko
Icon
Ime
Artur-P-Audio_04.tar
Velikost
9.93 GB
Format
Neznano
Opis
Audio files, part P, tar 4
MD5
e3bc5137b57e484b061ed472b33b6e89
 Prenesi datoteko
Icon
Ime
Artur-P-Audio_05.tar
Velikost
9.79 GB
Format
Neznano
Opis
Audio files, part P, tar 5
MD5
5416dacba46615d74bdb315628bb8570
 Prenesi datoteko

Prikaži enostavni zapis vnosa