ASR training dataset for Croatian ParlaSpeech-HR v1.0 http://hdl.handle.net/11356/1494 The ParlaSpeech-HR.v1.0.jsonl (json lines) file consists of entries with the following attributes: path: name of the file with the segment recording orig_file: name of the original file harvested from YouTube start: second when the segment starts in the original file end: second when the segment ends in the original file words: list of words from the original transcript word_start_times: relative time references (in seconds) to each word norm_words: list of words normalized with an imperfect rule-based normaliser norm_words_start_times: relative time references (in seconds) to each word in the normalized transcript utterance_id_start: ID of the utterance in the ParlaMint 2.1 corpus (http://hdl.handle.net/11356/1432) where the segment starts utterance_id_end: ID of the utterance in the ParlaMint 2.1 corpus where the segment ends speaker_info: list of speaker attributes from ParlaMint 2.1, if single speaker (null otherwise) split: either "train", "dev", or "test", or "null" if multiple speakers