Datoteke v tem vnosu

Icon
Ime
ParlaSpeech-HR.v1.0.jsonl
Velikost
679.52 MB
Format
Neznano
Opis
Corpus in JSON Lines format
MD5
271ef6589623facd86527b1e05b740f4
 Prenesi datoteko
Icon
Ime
ParlaSpeech-HR.v1.0.txt
Velikost
1.07 KB
Format
Besedilna datoteka
Opis
README
MD5
71a7479a87e107510c99bc2602e1076e
 Prenesi datoteko  Predogled
 Predogled datoteke  
ASR training dataset for Croatian ParlaSpeech-HR v1.0
http://hdl.handle.net/11356/1494

The ParlaSpeech-HR.v1.0.jsonl (json lines) file consists of entries with the following attributes:

path: name of the file with the segment recording
orig_file: name of the original file harvested from YouTube
start: second when the segment starts in the original file
end: second when the segment ends in the original file
words: list of words from the original transcript
word_start_times: relative time references (in seconds) to each word
norm_words: list of words normalized with an imperfect rule-based normaliser
norm_words_start_times: relative time references (in seconds) to each word in the normalized transcript
utterance_id_start: ID of the utterance in the ParlaMint 2.1 corpus (http://hdl.handle.net/11356/1432) where the segment starts
utterance_id_end: ID of the utterance in the ParlaMint 2.1 corpus where the segment ends
speaker_info: list of speaker attributes from ParlaMint 2.1, if single . . .
                                            
Icon
Ime
ParlaSpeech-HR.flac.tgz.0
Velikost
48.83 GB
Format
Neznano
Opis
Speech in FLAC format, slice 0
MD5
84076b62f51eb1da9870c1f6c4da436b
 Prenesi datoteko
Icon
Ime
ParlaSpeech-HR.flac.tgz.1
Velikost
48.83 GB
Format
Neznano
Opis
Speech in FLAC format, slice 1
MD5
8123e76721d437837a2439dd662a973b
 Prenesi datoteko
Icon
Ime
ParlaSpeech-HR.flac.tgz.2
Velikost
18.93 GB
Format
Neznano
Opis
Speech in FLAC format, slice 2
MD5
cd8e71d1d93a3b89d10a208c288c824e
 Prenesi datoteko