Show simple item record

 
dc.contributor.author Verdonik, Darinka
dc.contributor.author Zwitter Vitez, Ana
dc.contributor.author Zemljarič Miklavčič, Jana
dc.contributor.author Krek, Simon
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Potočnik, Tomaž
dc.contributor.author Bizjak, Andreja
dc.contributor.author Žgank, Andrej
dc.contributor.author Bernjak, Mitja
dc.contributor.author Antloga, Špela
dc.contributor.author Majhenič, Simona
dc.contributor.author Čakš, Peter
dc.contributor.author Pucer, Matevž
dc.contributor.author Cvetko, Mitja
dc.contributor.author Pavlič, Jani
dc.contributor.author Dobrišek, Simon
dc.contributor.author Križaj, Janez
dc.contributor.author Bajec, Marko
dc.contributor.author Lebar Bajec, Iztok
dc.contributor.author Jelovšek, Tjaša
dc.contributor.author Trojar, Mitja
dc.contributor.author Dretnik, Naum
dc.contributor.author Bordon, David
dc.contributor.author VideoLectures.NET
dc.contributor.author Križaj, Janez
dc.date.accessioned 2025-02-13T09:12:38Z
dc.date.available 2025-02-13T09:12:38Z
dc.date.issued 2024-12-23
dc.identifier.uri http://hdl.handle.net/11356/1973
dc.description Gos 2.1 is the reference speech corpus of the Slovenian language. This edition contains about 300 hours of speech, or 2.4 million words, 127 thousand utterances and 1,500 texts. It is composed from three different sources: (1) Spoken corpus Gos 1.1 (http://hdl.handle.net/11356/1438), 112 hours, 1 million words (2) Spoken corpus Gos VideoLectures 4.2 (http://hdl.handle.net/11356/1222), 22 hours, 179,000 words (3) A selection from the ASR database ARTUR 1.0 (http://hdl.handle.net/11356/1776), 185 hours, 1.2 mllion words, including: (3a) Artur-J-Splosni, 62 hours, 422,000 words: media recordings, online recordings of conferences, workshops, education videos, etc. (3b) Artur-N-Prosti, 61 hours, 324,000 words: monologues and dialogues between two persons, recorded for the purposes of the Artur database. Speakers were asked to freely conversate or freely explain on casual topics. (3c) Artur-P-SejeDZ, 62 hours, 450,000 words: a selection speeches from the Slovene National Assembly. The maximum length of single speaker speech is 4,000 words. This entry includes audio files and additionally video files for the television recordings only. The format of the audio files is wav, pcm, 16-bit, mono, 44.1 kHz. Video files are in mp4 format. Transcript files are available at http://hdl.handle.net/11356/1863.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.publisher Faculty of Electrical Engineering and Computer Science, University of Maribor
dc.publisher Faculty of Electrical Engineering, University of Ljubljana
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://aclanthology.org/2024.lrec-main.691.pdf
dc.rights CLARIN.SI Licence ACA ID-BY-INF-NORED
dc.rights.uri https://clarin.si/repository/xmlui/page/licence-aca-id-by-inf-nored-1.0
dc.rights.label RES
dc.source.uri https://viri.cjvt.si/gos/System/About
dc.subject audio
dc.subject spoken corpus
dc.title Spoken corpus Gos 2.1 (audio, video)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
has.files yes
branding CLARIN.SI data & tools
demo.uri https://viri.cjvt.si/gos/
contact.person CJVT Centre for Language Resources and Technologies info@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
sponsor Republic of Slovenia, Ministry of Culture 3340-15-141005 Project Gos Videolectures nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds
size.info 319 hours
files.count 4
files.size 108317552640


 Files in this item

This item is
Restricted Use
and licensed under:
CLARIN.SI Licence ACA ID-BY-INF-NORED
Inform Before Use Attribution Required
Icon
Name
Gos.wav.tar
Size
34.37 GB
Format
Unknown
Description
Gos part of the speech data
MD5
9d25c54dfa68bcbdfa073bccc20eace5
 Download file
Icon
Name
GosVL.wav.tar
Size
6.52 GB
Format
Unknown
Description
Gos VideoLectures part of the speech data
MD5
1427cf94f918bfd81ff8182475248bba
 Download file
Icon
Name
Artur.wav.tar
Size
52.45 GB
Format
Unknown
Description
Artur part of the speech data
MD5
a1004dfd05c44144d23e4cb43a591f52
 Download file
Icon
Name
Gos.mp4.tar
Size
7.54 GB
Format
Unknown
Description
Gos video data
MD5
2b270db62d50add41ad6771694a057fa
 Download file

Show simple item record