dc.contributor.author | Verdonik, Darinka |
dc.contributor.author | Bizjak, Andreja |
dc.contributor.author | Žgank, Andrej |
dc.contributor.author | Bernjak, Mitja |
dc.contributor.author | Antloga, Špela |
dc.contributor.author | Majhenič, Simona |
dc.contributor.author | Čakš, Peter |
dc.contributor.author | Pucer, Matevž |
dc.contributor.author | Cvetko, Mitja |
dc.contributor.author | Zelenik, Marijana |
dc.contributor.author | Pavlič, Jani |
dc.contributor.author | Dobrišek, Simon |
dc.contributor.author | Križaj, Janez |
dc.contributor.author | Strle, Gregor |
dc.contributor.author | Ivanovska, Marija |
dc.contributor.author | Grm, Klemen |
dc.contributor.author | Bajec, Marko |
dc.contributor.author | Lebar Bajec, Iztok |
dc.contributor.author | Jelovšek, Tjaša |
dc.contributor.author | Lokovšek, Jure |
dc.contributor.author | Longyka, Jure |
dc.contributor.author | Trojar, Mitja |
dc.contributor.author | Žganec Gros, Jerneja |
dc.contributor.author | Mihelič, Aleš |
dc.contributor.author | Vesnicer, Boštjan |
dc.contributor.author | Dretnik, Naum |
dc.contributor.author | Bordon, David |
dc.date.accessioned | 2022-12-06T15:29:30Z |
dc.date.available | 2022-12-06T15:29:30Z |
dc.date.issued | 2022-12-01 |
dc.identifier.uri | http://hdl.handle.net/11356/1717 |
dc.description | ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840 hours are transcribed, while the remaining 195 hours are without transcription. The data is divided into 4 parts: (1) approx. 520 hours of read speech, which includes the reading of pre-defined sentences, selected from the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320); each sentence is contained in one file; speakers are demographically balanced; spelling is included in special files; all with manual transcriptions; (2) approx. 204 hours of public speech, which includes media recordings, online recordings of conferences, workshops, education videos, etc.; 56 hours are manually transcribed; (3) approx. 110 hours of private speech, which includes monologues and dialogues between two persons, recorded for the purposes of the speech database; the speakers are demographically balanced; two subsets for domain-specific ASR (i.e., smart-home and face-description) are included; 63 hours are manually transcribed; (4) approx. 201 hours of parliamentary speech, which includes recordings from the Slovene National Assembly, all with manual transcriptions. Audio files are WAV 44,1 kHz, pcm, 16-bit, mono. This entry includes the recordings only; transcriptions are available at http://hdl.handle.net/11356/1718. |
dc.language.iso | slv |
dc.publisher | Faculty of Electrical Engineering and Computer Science, University of Maribor |
dc.publisher | Faculty of Electrical Engineering, University of Ljubljana |
dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
dc.publisher | Alpineon d.o.o. |
dc.publisher | STA |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1776 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://slovenscina.eu/ |
dc.subject | speech database |
dc.subject | automatic speech recognition |
dc.subject | spoken language |
dc.subject | spoken corpus |
dc.title | ASR database ARTUR 0.1 (audio) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | audio |
hidden | hidden |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Darinka Verdonik darinka.verdonik@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor |
contact.person | Simon Dobrišek simon.dobrisek@fe.uni-lj.si Faculty of Electrical Engineering, University of Ljubljana |
contact.person | Marko Bajec marko.bajec@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
contact.person | Jerneja Žganec Gros jerneja.gros@alpineon.si Alpineon d.o.o. |
contact.person | Naum Dretnik nd@sta.si STA |
sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
size.info | 1035 hours |
files.count | 36 |
files.size | 330354797219 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- 00Preberime.txt
- Size
- 1.66 KB
- Format
- Text file
- Description
- Corpus structure description
- MD5
- 711dc7f4ecabc3270fb20eeb6ced77f5
00Preberime.txt - ta datoteka Artur-WAV - mapa z avdio posnetki v wav-formatu Artur-B - mapa z datotekami branega govora Artur-B-Brani - mapa z avdio posnetki branega govora Artur-B-G0001 itd. - mape z avdio posnetki posameznega govorca Artur-B-Crkovani - mapa z avdio posnetki črkovanj Artur-B-G0501 itd. - mape z avdio posnetki posameznega govorca Artur-B-Izloceno - mapa z avdio posnetki, ki odstopajo od specificiranih kriterijev (slaba kvaliteta wav, napaka pri branju, ...) Artur-B-G0001 itd. - mape z avdio posnetki posameznega govorca Artur-B-Nerazporejeno - mapa z avdio posnetki, ki še niso razporejeni Artur-B-G0058 itd. - mape z avdio posnetki posameznega govorca Artur-J - mapa z datotekami javnega govora Artur-J-Splosni-RT - mapa z ročno transkribiranimi avdio posnetki javnega govora Artur-J-Splosni-AT - mapa z netranskribiranimi avdio posnetki javnega govora Artur-N - mapa z datotekami nejavnega govora Artur-N-Ob . . .
- Name
- Artur0.1-Audio_00.tar
- Size
- 8.81 GB
- Format
- Unknown
- Description
- Audio files, tar 00
- MD5
- 612814c92e288ae3daf1a7cb22e55c98
- Name
- Artur0.1-Audio_01.tar
- Size
- 8.91 GB
- Format
- Unknown
- Description
- Audio files, tar 01
- MD5
- 0b1e2315a0cbda33a1e7118a270472fe
- Name
- Artur0.1-Audio_02.tar
- Size
- 9.15 GB
- Format
- Unknown
- Description
- Audio files, tar 02
- MD5
- 83c442f147d62c80496d27ae23de0551
- Name
- Artur0.1-Audio_03.tar
- Size
- 8.93 GB
- Format
- Unknown
- Description
- Audio files, tar 03
- MD5
- 0f2a0f66a3ba5e13d969af5fd49ea729
- Name
- Artur0.1-Audio_04.tar
- Size
- 8.85 GB
- Format
- Unknown
- Description
- Audio files, tar 04
- MD5
- c20f93a0e6012437fd294da29eefea50
- Name
- Artur0.1-Audio_05.tar
- Size
- 8.9 GB
- Format
- Unknown
- Description
- Audio files, tar 05
- MD5
- 938242ca1899c2b35480e3ab3dd09588
- Name
- Artur0.1-Audio_06.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 06
- MD5
- 93944855eab981b789a736be78e9b84d
- Name
- Artur0.1-Audio_07.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 07
- MD5
- 1d868da709ec24daad63f9e869a6dc80
- Name
- Artur0.1-Audio_08.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 08
- MD5
- 33ed8f3e96b040b79756e2eb63f22799
- Name
- Artur0.1-Audio_09.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 09
- MD5
- 09989cb49d4e155ddd7e9e06e3cf1e88
- Name
- Artur0.1-Audio_10.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 10
- MD5
- ee012742154d0e9b559355d8ba0cc6f5
- Name
- Artur0.1-Audio_11.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 11
- MD5
- 5776d1691f94f972ef4b52fbab2521ae
- Name
- Artur0.1-Audio_12.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 12
- MD5
- e84319c1f15d2f37f47f3d7e881e5eb9
- Name
- Artur0.1-Audio_13.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 13
- MD5
- 39b7e51c09f0c9506916577e917e106a
- Name
- Artur0.1-Audio_14.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 14
- MD5
- b2de7a5eb42c8c7e8c4be522b924c784
- Name
- Artur0.1-Audio_15.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 15
- MD5
- 84c0fce9f71bf7a97edc525e62cc7992
- Name
- Artur0.1-Audio_16.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 16
- MD5
- 64dd5f220cdd787151251c364f272216
- Name
- Artur0.1-Audio_17.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 17
- MD5
- 6028f478b0b5b1d766c477fec26bae08
- Name
- Artur0.1-Audio_18.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 18
- MD5
- 0d44aeb4be268d271a075cc175bd4b6e
- Name
- Artur0.1-Audio_19.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 19
- MD5
- 14b70bcf9fd0d7193b51803419f689fe
- Name
- Artur0.1-Audio_20.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 20
- MD5
- 6a389224398f2383d5afe968559c9bb8
- Name
- Artur0.1-Audio_21.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 21
- MD5
- 9771e1d94cf1f0225d57525bcd3b6269
- Name
- Artur0.1-Audio_22.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 22
- MD5
- 5855ac84c86be99711f42aadca7d2609
- Name
- Artur0.1-Audio_23.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 23
- MD5
- cf7d7c51afb2ac5238faf66d7aa305ff
- Name
- Artur0.1-Audio_24.tar
- Size
- 8.88 GB
- Format
- Unknown
- Description
- Audio files, tar 24
- MD5
- 3380755b578a6aba7713193df1bbe79c
- Name
- Artur0.1-Audio_25.tar
- Size
- 8.82 GB
- Format
- Unknown
- Description
- Audio files, tar 25
- MD5
- a845f921649fc7d8ebeccb7978d8c2e7
- Name
- Artur0.1-Audio_26.tar
- Size
- 8.8 GB
- Format
- Unknown
- Description
- Audio files, tar 26
- MD5
- 95b700313a9f577d55d2441d4c75ecae
- Name
- Artur0.1-Audio_27.tar
- Size
- 8.8 GB
- Format
- Unknown
- Description
- Audio files, tar 27
- MD5
- 46be36ffd0d0361dd5abfe50034e5885
- Name
- Artur0.1-Audio_28.tar
- Size
- 8.82 GB
- Format
- Unknown
- Description
- Audio files, tar 28
- MD5
- 2c7ae97103c44f50cecf2c221813a52d
- Name
- Artur0.1-Audio_29.tar
- Size
- 8.84 GB
- Format
- Unknown
- Description
- Audio files, tar 29
- MD5
- 05b78fa7b81aec6ab95731460779fcb0
- Name
- Artur0.1-Audio_30.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 30
- MD5
- bce5b56eb351e0070226142768829788
- Name
- Artur0.1-Audio_31.tar
- Size
- 8.79 GB
- Format
- Unknown
- Description
- Audio files, tar 31
- MD5
- 6001a0b0b5a11a8e3654e541a960abed
- Name
- Artur0.1-Audio_32.tar
- Size
- 8.85 GB
- Format
- Unknown
- Description
- Audio files, tar 32
- MD5
- 72a345bd7e006957d37d0f32647b45d1
- Name
- Artur0.1-Audio_33.tar
- Size
- 8.84 GB
- Format
- Unknown
- Description
- Audio files, tar 33
- MD5
- cd93ea70f84dca21c9969a13d7137c3f
- Name
- Artur0.1-Audio_34.tar
- Size
- 7.66 GB
- Format
- Unknown
- Description
- Audio files, tar 34
- MD5
- 015f6a173b6b9d6b494ca0f64785dfeb