Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- ParlaSpeech-PL.v1.0.jsonl.gz
- Size
- 124.96 MB
- Format
- application/gzip
- Description
- Corpus text in gzipped JSON Lines format
- MD5
- a186d99cf96f15be6898cf86bc261f34

- Name
- ParlaSpeech-PL.v1.0.part1.tgz
- Size
- 27.87 GB
- Format
- Unknown
- Description
- Speech in FLAC format, part 1
- MD5
- cbef0242706ee876bd27e7e151c69ba2

- Name
- ParlaSpeech-PL.v1.0.part2.tgz
- Size
- 30.74 GB
- Format
- Unknown
- Description
- Speech in FLAC format, part 2
- MD5
- 95332c745a4c79a56dcc78bc34b30cb1

- Name
- README.txt
- Size
- 1 KB
- Format
- Text file
- Description
- Description of the corpus format
- MD5
- 53d3b9c770e2ed6f4cbff71b6d4f267e
Parliamentary spoken corpus of Polish ParlaSpeech-PL v1.0 http://hdl.handle.net/11356/1686 The ParlaSpeech-PL.v1.0.jsonl (JSON lines) file consists of entries with the following attributes: id: ParlaMint utterance ID with zero-based character offsets pointing to the specific part of the utterance words: List of character and milisecond offsets to specific words in the trasncript, especially useful for further segmentation of each entry audio: path to the FLAC file (available from the part*.tgz files), the folder name corresponding to the YouTube video ID audio_length: length of the recording in seconds text: transcript of the audio text_start: starting character position in the original ParlaMint 4.0 utterance text_end: ending character position in the original ParlaMint 4.0 utterance audio_start: starting milisecond position in the original YouTube video audio_end: ending milisecond position in the original YouTube video speaker_info: full information on the speaker (and speech) fro . . .