dc.contributor.author | Verdonik, Darinka |
dc.contributor.author | Potočnik, Tomaž |
dc.contributor.author | Sepesy Maučec, Mirjam |
dc.contributor.author | Erjavec, Tomaž |
dc.date.accessioned | 2017-10-12T13:35:27Z |
dc.date.available | 2017-10-12T13:35:27Z |
dc.date.issued | 2017-10-11 |
dc.identifier.uri | http://hdl.handle.net/11356/1158 |
dc.description | Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus contains a selection of public lectures available through the web portal Videolectures.net provided by the Jožef Stefan Institute, and covers 9.8 hours of speech. This resource contains only annotated transcriptions of the corpus – audio recordings are available at http://hdl.handle.net/11356/1159. All transcriptions for Gos VideoLectures were done manually and carefully checked. The main guidelines for transcription were those of the Gos corpus (http://www.korpus-gos.net/Support/About). The transcription tool Transcriber 1.5.1 (http://trans.sourceforge.net/en/presentation.php) was used for making transcriptions. It can be also used for reading or exporting transcriptions (.trs files) to different formats. The transcriptions comprise the TRS files with tabular metadata, their conversion to TEI and to the CWB vertical file format. Each recording has two TRS files, one with pronunciation-based and the other with the standardised/normalised transcription. The TEI and CWB encodings join these two transcriptions at the token level, with the normalised words being also automatically PoS tagged and lemmatised. The corpus can be used for training continuous speech recognition for Slovene language, for phonetic research or any other research of Slovene academic speech. |
dc.language.iso | slv |
dc.publisher | Faculty of Electrical Engineering and Computer Science, University of Maribor |
dc.relation.replaces | http://hdl.handle.net/11356/1069 |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1190 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | speech database |
dc.subject | spoken corpus |
dc.subject | academic speech |
dc.subject | speech transcription |
dc.subject | speech recognition |
dc.subject | TEI |
dc.title | Spoken corpus Gos VideoLectures 2.0 (transcription) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | hidden |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Darinka Verdonik darinka.verdonik@um.si Faculty of Electrical Engineering and Computer Science, University of Maribor |
sponsor | Republic of Slovenia, Ministry of Culture 3340-15-141005 Project Gos Videolectures nationalFunds |
size.info | 25 texts |
size.info | 821 utterances |
size.info | 4084 sentences |
size.info | 79420 words |
files.count | 3 |
files.size | 2395551 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (2.28 MB)To je vnos
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
z licenco:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Ime
- GosVL.TEI.zip
- Velikost
- 1.12 MB
- Format
- application/zip
- Opis
- Tagged and lemmatised transcriptions in TEI format
- MD5
- cedbf26f3a3949dc19a01a82f682373e
- GosVL.TEI
- GosVL.tei.xml8 kB
- 00INDEX.txt4 kB
- GosVL07_kungf.tei.xml868 kB
- GosVL08_droge.tei.xml190 kB
- GosVL21_poraz.tei.xml145 kB
- GosVL12_cujec.tei.xml793 kB
- GosVL20_zumer.tei.xml123 kB
- GosVL03_medit.tei.xml245 kB
- GosVL05_zsrce.tei.xml121 kB
- GosVL23_jeklo.tei.xml120 kB
- GosVL10_partn.tei.xml593 kB
- GosVL18_aritm.tei.xml302 kB
- GosVL19_pujsk.tei.xml223 kB
- GosVL13_menin.tei.xml311 kB
- GosVL24_inter.tei.xml386 kB
- GosVL11_lhise.tei.xml121 kB
- GosVL06_kzcoi.tei.xml402 kB
- GosVL22_siste.tei.xml229 kB
- schema
- tei_gos.zip46 kB
- tei_gos.rnc201 kB
- tei_gos.dtd170 kB
- tei_gos_schema.xml5 kB
- tei_gos_doc.html2 MB
- tei_gos.rng435 kB
- MTE-msd.tei.xml160 kB
- GosVL04_fitot.tei.xml132 kB
- GosVL09_ocean.tei.xml467 kB
- GosVL02_kleme.tei.xml137 kB
- GosVL01_pravo.tei.xml488 kB
- GosVL15_celia.tei.xml223 kB
- GosVL16_stara.tei.xml371 kB
- 00README.txt158 B
- GosVL25_nanom.tei.xml243 kB
- GosVL14_karci.tei.xml225 kB
- GosVL17_inten.tei.xml276 kB

- Ime
- GosVL.vert.zip
- Velikost
- 452.89 KB
- Format
- application/zip
- Opis
- Tagged and lemmatised transcriptions in vertical format
- MD5
- deef5b653f37d1d1cf361f0b5fa93cc3
- GosVL.vert
- gos_vl.vert2 MB
- gos_vl.regi4 kB
- 00INDEX.txt4 kB
- 00README.txt158 B

- Ime
- GosVL.TRS.zip
- Velikost
- 737.88 KB
- Format
- application/zip
- Opis
- Transcriptions in TRS format with tabular metadata (in Slovene)
- MD5
- 721e3b749d30ed80f2692eb09e312229
- GosVL.TRS
- GosVL20_zumer_s2.trs10 kB
- GosVL24_inter_s3.trs29 kB
- GosVL15_celia_s3.trs25 kB
- GosVL10_partn_s2.trs50 kB
- GosVL20_zumer_dis.txt758 B
- GosVL23_jeklo_s3.trs10 kB
- GosVL05_zsrce_s2.trs11 kB
- GosVL11_lhise_s3.trs12 kB
- GosVL16_stara_s3.trs34 kB
- GosVL19_pujsk_s3.trs19 kB
- GosVL25_nanom_s3.trs23 kB
- GosVL13_menin_dis.txt695 B
- GosVL07_kungf_s2.trs68 kB
- GosVL04_fitot_s2.trs12 kB
- GosVL09_ocean_g1.txt150 B
- GosVL08_droge_s2.trs19 kB
- GosVL25_nanom_dis.txt756 B
- GosVL24_inter_dis.txt399 B
- GosVL12_cujec_s2.trs81 kB
- GosVL15_celia_dis.txt742 B
- GosVL13_menin_s2.trs28 kB
- GosVL03_medit_s2.trs21 kB
- GosVL17_inten_s3.trs19 kB
- GosVL14_karci_g1.txt286 B
- GosVL18_aritm_dis.txt743 B
- GosVL03_medit_dis.txt424 B
- GosVL24_inter_s2.trs28 kB
- GosVL15_celia_s2.trs25 kB
- GosVL10_partn_dis.txt447 B
- GosVL23_jeklo_s2.trs10 kB
- GosVL11_lhise_s2.trs11 kB
- GosVL22_siste_s3.trs15 kB
- GosVL04_fitot_dis.txt442 B
- GosVL16_stara_dis.txt754 B
- GosVL05_zsrce_dis.txt473 B
- GosVL16_stara_s2.trs33 kB
- GosVL19_pujsk_s2.trs19 kB
- GosVL25_nanom_s2.trs23 kB
- GosVL02_kleme_g1.txt155 B
- GosVL18_aritm_s3.trs27 kB
- GosVL20_zumer_g1.txt264 B
- GosVL17_inten_s2.trs19 kB
- GosVL01_pravo_s3.trs38 kB
- GosVL15_celia_g2.txt287 B
- GosVL-README.trs.pdf248 kB
- GosVL10_partn_g1.txt154 B
- GosVL06_kzcoi_s3.trs34 kB
- GosVL05_zsrce_g1.txt155 B
- GosVL11_lhise_g2.txt154 B
- GosVL08_droge_dis.txt410 B
- GosVL22_siste_dis.txt750 B
- GosVL07_kungf_g1.txt154 B
- GosVL04_fitot_g1.txt154 B
- GosVL22_siste_s2.trs15 kB
- GosVL08_droge_g1.txt151 B
- GosVL12_cujec_g1.txt288 B
- GosVL21_poraz_s3.trs13 kB
- GosVL13_menin_g1.txt285 B
- 00INDEX.txt4 kB
- GosVL03_medit_g1.txt154 B
- GosVL21_poraz_dis.txt723 B
- GosVL12_cujec_dis.txt696 B
- GosVL18_aritm_s2.trs26 kB
- GosVL07_kungf_dis.txt433 B
- GosVL24_inter_g1.txt153 B
- GosVL01_pravo_s2.trs38 kB
- GosVL15_celia_g1.txt287 B
- GosVL06_kzcoi_s2.trs33 kB
- GosVL23_jeklo_g1.txt150 B
- GosVL11_lhise_dis.txt490 B
- GosVL06_kzcoi_dis.txt461 B
- GosVL11_lhise_g1.txt154 B
- GosVL16_stara_g1.txt259 B
- GosVL09_ocean_s3.trs43 kB
- GosVL19_pujsk_g1.txt287 B
- GosVL25_nanom_g1.txt151 B
- GosVL19_pujsk_dis.txt786 B
- GosVL23_jeklo_dis.txt734 B
- GosVL14_karci_s3.trs20 kB
- GosVL21_poraz_s2.trs13 kB
- GosVL17_inten_g1.txt286 B
- GosVL22_siste_g1.txt285 B
- GosVL09_ocean_s2.trs43 kB
- 00README.txt158 B
- GosVL02_kleme_s3.trs14 kB
- GosVL01_pravo_dis.txt429 B
- GosVL20_zumer_s3.trs10 kB
- GosVL14_karci_s2.trs20 kB
- GosVL18_aritm_g1.txt265 B
- GosVL10_partn_s3.trs52 kB
- trans-14.dtd2 kB
- GosVL05_zsrce_s3.trs11 kB
- GosVL01_pravo_g1.txt154 B
- GosVL06_kzcoi_g1.txt155 B
- GosVL14_karci_dis.txt764 B
- GosVL07_kungf_s3.trs71 kB
- GosVL04_fitot_s3.trs12 kB
- GosVL17_inten_dis.txt705 B
- GosVL09_ocean_dis.txt426 B
- GosVL08_droge_s3.trs26 kB
- GosVL12_cujec_s3.trs82 kB
- GosVL02_kleme_dis.txt455 B
- GosVL13_menin_s3.trs29 kB
- GosVL03_medit_s3.trs22 kB
- GosVL21_poraz_g1.txt156 B
- GosVL02_kleme_s2.trs14 kB