Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Rupnik, Peter
dc.contributor.author Koržinek, Danijel
dc.date.accessioned 2024-02-08T15:40:33Z
dc.date.available 2024-02-08T15:40:33Z
dc.date.issued 2024-02-08
dc.identifier.uri http://hdl.handle.net/11356/1834
dc.description The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary recordings available from the Serbian Parliament's YouTube channel. The corpus consists of audio segments that correspond to specific sentences in the transcripts. The transcript contains word-level alignments to the recordings, allowing for simple further segmentation of long sentences into shorter segments for ASR and other memory-sensitive applications. Each segment has a reference to the ParlaMint 4.0 corpus (http://hdl.handle.net/11356/1859) via utterance IDs and character offsets. All the speaker information from the ParlaMint corpus is available via the "speaker_info" key.
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://doi.org/10.1007/978-3-031-77961-9_10
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/parlamint
dc.subject parliamentary debates
dc.subject speech recordings
dc.subject speech database
dc.subject speech recognition
dc.subject automatic speech recognition
dc.subject speech transcription
dc.title Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
has.files yes
branding CLARIN.SI data & tools
demo.uri https://huggingface.co/datasets/classla/ParlaSpeech-RS
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds
size.info 290778 entries
size.info 3226388 seconds
size.info 896 hours
files.count 4
files.size 67789449157
featuredService.noske search|https://www.clarin.si/ske/#concordance?corpname=parlaspeech_rs


 Datoteke v tem vnosu

Icon
Ime
ParlaSpeech-RS.v1.0.jsonl.gz
Velikost
102.73 MB
Format
application/gzip
Opis
Corpus text in gzipped JSON Lines format
MD5
4b83f759fabd6d0dcb1bf391090b2143
 Prenesi datoteko
Icon
Ime
ParlaSpeech-RS.v1.0.part1.tgz
Velikost
36.41 GB
Format
Neznano
Opis
Speech in FLAC format, part 1
MD5
83ff0608114a8c2701f712112ce88f03
 Prenesi datoteko
Icon
Ime
ParlaSpeech-RS.v1.0.part2.tgz
Velikost
26.62 GB
Format
Neznano
Opis
Speech in FLAC format, part 2
MD5
628efb94708a9e10d02fd825ac853a4c
 Prenesi datoteko
Icon
Ime
README.txt
Velikost
1 KB
Format
Besedilna datoteka
Opis
Description of the corpus format
MD5
dc33d4dd9eb8d6b8a29a28fd1ed309cf
 Prenesi datoteko  Predogled
 Predogled datoteke  
Parliamentary spoken corpus of Serbian ParlaSpeech-RS v1.0
http://hdl.handle.net/11356/1834

The ParlaSpeech-RS.v1.0.jsonl (JSON lines) file consists of entries with the following attributes:

id: ParlaMint utterance ID with zero-based character offsets pointing to the specific part of the utterance
words: List of character and milisecond offsets to specific words in the trasncript, especially useful for further segmentation of each entry
audio: path to the FLAC file (available from the part*.tgz files), the folder name corresponding to the YouTube video ID
audio_length: length of the recording in seconds
text: transcript of the audio
text_start: starting character position in the original ParlaMint 4.0 utterance
text_end: ending character position in the original ParlaMint 4.0 utterance
audio_start: starting milisecond position in the original YouTube video
audio_end: ending milisecond position in the original YouTube video
speaker_info: full information on the speaker (and speech) fr . . .
                                            

Prikaži enostavni zapis vnosa