Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- ParlaSpeech-HR.v1.0.jsonl
- Size
- 679.52 MB
- Format
- Unknown
- Description
- Corpus in JSON Lines format
- MD5
- 271ef6589623facd86527b1e05b740f4

- Name
- ParlaSpeech-HR.v1.0.txt
- Size
- 1.07 KB
- Format
- Text file
- Description
- README
- MD5
- 71a7479a87e107510c99bc2602e1076e
ASR training dataset for Croatian ParlaSpeech-HR v1.0 http://hdl.handle.net/11356/1494 The ParlaSpeech-HR.v1.0.jsonl (json lines) file consists of entries with the following attributes: path: name of the file with the segment recording orig_file: name of the original file harvested from YouTube start: second when the segment starts in the original file end: second when the segment ends in the original file words: list of words from the original transcript word_start_times: relative time references (in seconds) to each word norm_words: list of words normalized with an imperfect rule-based normaliser norm_words_start_times: relative time references (in seconds) to each word in the normalized transcript utterance_id_start: ID of the utterance in the ParlaMint 2.1 corpus (http://hdl.handle.net/11356/1432) where the segment starts utterance_id_end: ID of the utterance in the ParlaMint 2.1 corpus where the segment ends speaker_info: list of speaker attributes from ParlaMint 2.1, if single . . .

- Name
- ParlaSpeech-HR.flac.tgz.0
- Size
- 48.83 GB
- Format
- Unknown
- Description
- Speech in FLAC format, slice 0
- MD5
- 84076b62f51eb1da9870c1f6c4da436b

- Name
- ParlaSpeech-HR.flac.tgz.1
- Size
- 48.83 GB
- Format
- Unknown
- Description
- Speech in FLAC format, slice 1
- MD5
- 8123e76721d437837a2439dd662a973b

- Name
- ParlaSpeech-HR.flac.tgz.2
- Size
- 18.93 GB
- Format
- Unknown
- Description
- Speech in FLAC format, slice 2
- MD5
- cd8e71d1d93a3b89d10a208c288c824e