Show simple item record

 
dc.contributor.author Kopp, Matyáš
dc.contributor.author Kryvenko, Anna
dc.contributor.author Rii, Andriana
dc.date.accessioned 2023-11-30T10:03:49Z
dc.date.available 2023-11-30T10:03:49Z
dc.date.issued 2023-11-29
dc.identifier.uri http://hdl.handle.net/11356/1900
dc.description The Ukrainian parliamentary corpus ParlaMint-UA 4.0.1 is an extended version of the ParlaMint-UA 4.0 corpus (available as a collection of plain texts along with TSV metadata of the speeches http://hdl.handle.net/11356/1859 and as a collection of speeches with added automatic linguistic annotations http://hdl.handle.net/11356/1860, both being part of the “ParlaMint: Towards Comparable Parliamentary Corpora” project by CLARIN ERIC (https://www.clarin.eu/parlamint). The Ukrainian parliamentary corpus ParlaMint-UA 4.0.1 contains plenary proceedings for the 4th, 5th, 6th, 7th, 8th and 9th terms of the Rada between 14 May 2002 and 10 November 2023. Tokens in Ukrainian comprise 94% and tokens in Russian comprise 6%. The transcripts are grouped by dates with information on the term, session and meeting, and contain speeches marked by the speaker and their role (chair, regular speaker or guest). The speeches also contain marked-up transcriber comments, such as noise, applause, shouting, etc. The corpus has extensive metadata on speakers including their name, the year of birth (when available in open sources), gender, MP and minister status, and party affiliation (when known from open sources), and political parties, parliamentary factions and groups including their name, left-to-right political orientation (Wikipedia-sourced or manually encoded, when absent in Wikipedia) and coalition/opposition status. The corpus is encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), as well as following the much stricter ParlaMint encoding guidelines (https://clarin-eric.github.io/ParlaMint/) and schemas. The corpus comes in two versions. One version contains plain texts of plenary speeches. The other version contains texts of the same plenary speeches that are linguistically annotated including tokenization; sentence segmentation; lemmatisation; Universal Dependencies part-of-speech, morphological features, and syntactic dependencies; and the 4-class CoNLL-2003 named entities. Compared to ParlaMint-UA 4.0, the Ukrainian parliamentary corpus ParlaMint-UA 4.0.1 has doubled the time-span and now includes older data between 2002 and 2012 and more recent data between September and November 2023. It enhances language identification between Ukrainian and Russian from the paragraph level to the sentence level to advance research on code-switching in public discourse. Also, the errors found in ParlaMint 4.0 have been corrected.
dc.language.iso ukr
dc.language.iso rus
dc.publisher CLARIN.SI
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://ufal.github.io/ParlaMint-UA/
dc.subject Parla-CLARIN
dc.subject parliamentary debates
dc.subject TEI
dc.subject Ukrainian Parliament
dc.title Ukrainian parliamentary corpus ParlaMint-UA 4.0.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matyáš Kopp kopp@ufal.mff.cuni.cz Charles University
contact.person Anna Kryvenko Ganna.Kryvenko@inz.si Institute of Contemporary History
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0288 the MSCA Seal of Excellence postdoctoral project 'The Changing Discursive Semantics of EU Representations' (2022-2024) nationalFunds
size.info 429437 utterances
size.info 41997790 words
files.count 4
files.size 4123217735
featuredService.kontext search|https://www.clarin.si/kontext/query?corpname=parlamint401_ua
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=parlamint401_ua


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
ParlaMint-UA.tgz
Size
291.4 MB
Format
Unknown
Description
Corpus in source TEI and derived formats
MD5
626ceb1087796082e99347a89ed8dd81
 Download file
Icon
Name
ParlaMint-UA.ana.tgz
Size
3.55 GB
Format
Unknown
Description
Linguistically annotated corpus in source TEI and derived formats
MD5
7b9443ad6f8591c25cbf0477cbc0a62a
 Download file
Icon
Name
ParlaMint-Schema.zip
Size
224.7 KB
Format
application/zip
Description
XML schemas for the corpus
MD5
68b5ee980f3c94a6a531fe8aee4b4066
 Download file  Preview
 File Preview  
  • ParlaMint-Schema
    • ParlaMint.odd.rng208 kB
    • ParlaMint.rnc13 kB
    • README.md4 kB
    • ParlaMint.odd.rnc98 kB
    • parla-clarin.rng608 kB
    • ParlaMint-TEI.ana.rng9 kB
    • ParlaMint-TEI.ana.rnc4 kB
    • ParlaMint-listOrg.rng680 B
    • ParlaMint-listOrg.rnc379 B
    • ParlaMint-teiCorpus.ana.rng2 kB
    • ParlaMint-teiCorpus.ana.rnc1 kB
    • ParlaMint-teiCorpus.rng4 kB
    • ParlaMint-listPerson.rng690 B
    • ParlaMint-schemaSpecs.odd.xml331 kB
    • ParlaMint-TEI.rng16 kB
    • ParlaMint-teiCorpus.rnc2 kB
    • ParlaMint-listPerson.rnc389 B
    • ParlaMint-TEI.rnc6 kB
    • ParlaMint.odd.sch569 B
    • ParlaMint.odd.xml208 kB
    • ParlaMint-taxonomy.rng683 B
    • ParlaMint.rng28 kB
    • ParlaMint-taxonomy.rnc382 B
Icon
Name
ParlaMint-UA.logs.zip
Size
423.01 KB
Format
application/zip
Description
Build log files of the corpus
MD5
75d2273b9f600a92675fd872dc037742
 Download file  Preview
 File Preview  
  • ParlaMint-UA.logs
    • ParlaMint-UA.TSV.log1 MB
    • ParlaMint-UA.log5 MB
    • ParlaMint-UA.error.log0 B
    • ParlaMint-UA.warn.log1 MB

Show simple item record