| dc.contributor.author | Verdonik, Darinka |
| dc.contributor.author | Bizjak, Andreja |
| dc.contributor.author | Sepesy Maučec, Mirjam |
| dc.contributor.author | Gril, Lucija |
| dc.contributor.author | Dobrišek, Simon |
| dc.contributor.author | Križaj, Janez |
| dc.contributor.author | Strle, Gregor |
| dc.contributor.author | Bajec, Marko |
| dc.contributor.author | Lebar Bajec, Iztok |
| dc.contributor.author | Jelovšek, Tjaša |
| dc.contributor.author | Lokovšek, Jure |
| dc.contributor.author | Trojar, Mitja |
| dc.date.accessioned | 2022-12-06T15:28:54Z |
| dc.date.available | 2022-12-06T15:28:54Z |
| dc.date.issued | 2022-12-06 |
| dc.identifier.uri | http://hdl.handle.net/11356/1718 |
| dc.description | ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840 hours are transcribed, while the remaining 195 hours are without transcription. The data is divided into 4 parts: (1) approx. 520 hours of read speech, which includes the reading of pre-defined sentences, selected from the corpus Gigafida; each sentence is contained in one file; speakers are demographically balanced; spelling is included in special files; all with manual transcriptions; (2) approx. 204 hours of public speech, which includes media recordings, online recordings of conferences, workshops, education videos, etc.; 56 hours are manually transcribed; (3) approx. 110 hours of private speech, which includes monologues and dialogues between two persons, recorded for the purposes of the speech database; the speakers are demographically balanced; two subsets for domain-specific ASR (i.e., smart-home and face-description) are included; 63 hours are manually transcribed; (4) approx. 201 hours of parliamentary speech, which includes recordings from the Slovene National Assembly, all with manual transcriptions. This repository entry includes transcriptions in Transcriber 1.5.1 TRS format only; audio recordings are available at http://hdl.handle.net/11356/1717. |
| dc.language.iso | slv |
| dc.publisher | Faculty of Electrical Engineering and Computer Science, University of Maribor |
| dc.publisher | Faculty of Electrical Engineering, University of Ljubljana |
| dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
| dc.publisher | ZRC SAZU |
| dc.relation.isreplacedby | http://hdl.handle.net/11356/1772 |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://slovenscina.eu/ |
| dc.subject | speech database |
| dc.subject | spoken language |
| dc.subject | spoken corpus |
| dc.subject | automatic speech recognition |
| dc.title | ASR database ARTUR 0.1 (transcriptions) |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| hidden | hidden |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Darinka Verdonik darinka.verdonik@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor |
| contact.person | Simon Dobrišek simon.dobrisek@fe.uni-lj.si Faculty of Electrical Engineering, University of Ljubljana |
| contact.person | Marko Bajec marko.bajec@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
| contact.person | Mitja Trojar mitja.trojar@zrc-sazu.si ZRC SAZU |
| sponsor | Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other |
| size.info | 722 hours |
| files.count | 2 |
| files.size | 46743263 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (44.58 MB)To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Ime
- Artur_0.1_Transcriptions.tgz
- Velikost
- 44.58 MB
- Format
- Neznano
- Opis
- Unix TAR archive of all the corpus files compressed with Gnu Zip
- MD5
- 42b28577a7ece758a27abf5bbd2a1e67
- Ime
- 00Preberime.txt
- Velikost
- 2.17 KB
- Format
- Besedilna datoteka
- Opis
- Corpus file structure description (in Slovenian)
- MD5
- 9a0b7646a1b9b552e99678cc15cfe254
00Preberime.txt - ta datoteka
Artur-DOC - mapa z dokumentacijo o bazi Artur
Artur-B - mapa z dokumentacijo o branem govoru
Artur-J - mapa z dokumentacijo o javnem govoru
Artur-N - mapa z dokumentacijo o nejavnem govoru
Artur-P - mapa z dokumentacijo o parlamentarnem govoru
Artur-TRS - mapa s transkripcijami v trs-formatu
Artur-B - mapa z datotekami branega govora
00Artur-B-Govorci.tsv - datoteka s podatki o govorcih branega govora
00Artur-B-Posnetki.tsv - datoteka s podatki o posnetkih branega govora
Artur-B-Brani - mapa s transkripcijami branega govora
Artur-B-G0001 itd. - mape s transkripcijami za posameznega govorca
Artur-B-Crkovani - mapa s transkripcijami črkovanja
Artur-B-G0501 itd. - mape s transkripcijami za posameznega govorca
Artur-B-Izloceno - mapa s transkripcijami, ki odstopajo od specificiranih kriterijev (slaba kvaliteta wav, napaka pri branju...)
Artur-B-G0001 itd. - mape s transkripcijami za posamezne . . .