Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ogrodniczuk, Maciej
dc.contributor.author Osenova, Petya
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Simov, Kiril
dc.contributor.author Grigorova, Vladislava
dc.contributor.author Rudolf, Michał
dc.contributor.author Pančur, Andrej
dc.contributor.author Kopp, Matyáš
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author van der Pol, Henk
dc.contributor.author Depoorter, Griet
dc.contributor.author de Does, Jesse
dc.contributor.author Jongejan, Bart
dc.contributor.author Haltrup Hansen, Dorte
dc.contributor.author Navarretta, Costanza
dc.contributor.author Calzada Pérez, María
dc.contributor.author de Macedo, Luciana D.
dc.contributor.author van Heusden, Ruben
dc.contributor.author Marx, Maarten
dc.contributor.author Çöltekin, Çağrı
dc.contributor.author Coole, Matthew
dc.contributor.author Agnoloni, Tommaso
dc.contributor.author Frontini, Francesca
dc.contributor.author Montemagni, Simonetta
dc.contributor.author Quochi, Valeria
dc.contributor.author Venturi, Giulia
dc.contributor.author Ruisi, Manuela
dc.contributor.author Marchetti, Carlo
dc.contributor.author Battistoni, Roberto
dc.contributor.author Sebők, Miklós
dc.contributor.author Ring, Orsolya
dc.contributor.author Darģis, Roberts
dc.contributor.author Utka, Andrius
dc.contributor.author Petkevičius, Mindaugas
dc.contributor.author Briedienė, Monika
dc.contributor.author Krilavičius, Tomas
dc.contributor.author Morkevičius, Vaidas
dc.contributor.author Bartolini, Roberto
dc.contributor.author Cimino, Andrea
dc.contributor.author Diwersy, Sascha
dc.contributor.author Luxardo, Giancarlo
dc.contributor.author Rayson, Paul
dc.date.accessioned 2021-06-18T09:25:46Z
dc.date.available 2021-06-18T09:25:46Z
dc.date.issued 2021-06-18
dc.identifier.uri http://hdl.handle.net/11356/1431
dc.description ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (from November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1432. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.1 release of the data and scripts available at the GitHub repository of the ParlaMint project. As opposed to the previous version 2.0, this version corrects some errors in various corpora and adds the information on upper / lower house for bicameral parliaments. The vertical files have also been changed to make them easier to use in the concordancers.
dc.language.iso bul
dc.language.iso hrv
dc.language.iso pol
dc.language.iso slv
dc.language.iso ces
dc.language.iso isl
dc.language.iso fra
dc.language.iso nld
dc.language.iso dan
dc.language.iso spa
dc.language.iso tur
dc.language.iso eng
dc.language.iso ita
dc.language.iso hun
dc.language.iso lav
dc.language.iso lit
dc.publisher CLARIN ERIC
dc.relation.replaces http://hdl.handle.net/11356/1405
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/content/parlamint
dc.subject Parla-CLARIN
dc.subject parliamentary debates
dc.subject COVID-19
dc.subject TEI
dc.subject Bulgarian Parliament
dc.subject Croatian Parliament
dc.subject Polish Parliament
dc.subject Slovenian Parliament
dc.subject Czech Parliament
dc.subject Icelandic Parliament
dc.subject Belgian Parliament
dc.subject Danish Parliament
dc.subject Spanish Parliament
dc.subject Dutch Parliament
dc.subject Turkish Parliament
dc.subject English Parliament
dc.subject Italian Parliament
dc.subject Hungarian Parliament
dc.subject Latvian Parliament
dc.subject Lithuanian Parliament
dc.subject French Parliament
dc.title Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/clarin-eric/ParlaMint/
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Ministry of Education and Science Republic of Bulgaria DO01-272/16.12.2019 Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies CLaDA-BG nationalFunds
sponsor LINDAT/CLARIAH-CZ LM2018101 Digital Research Infrastructure for Language Technologies, Arts and Humanities nationalFunds
sponsor Spanish Ministry of Science and Innovation PID2019-108866RB-I0 / AEI / 10.13039/501100011033 Original, translated and interpreted representations of the refugee cris(e)s: methodological triangulation within corpus-based discourse studies nationalFunds
sponsor The Research Council of Lithuania P-MIP-20-373 Policy Agenda of the Lithuanian Seimas and its Framing: The Analysis of the Seimas Debates in 1990 2020 nationalFunds
sponsor CLARIN-LV, European Regional Development Fund project 1.1.1.5/18/I/016 University of Latvia and institutes in the European Research Area - Excellency, activity, mobility, capacity Other
size.info 3774204 utterances
size.info 494949904 words
files.count 18
files.size 25096118813
featuredService.kontext Belgian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_be
featuredService.kontext Bulgarian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_bg
featuredService.kontext Czech|https://www.clarin.si/kontext/first_form?corpname=parlamint21_cz
featuredService.kontext Danish|https://www.clarin.si/kontext/first_form?corpname=parlamint21_dk
featuredService.kontext Spanish|https://www.clarin.si/kontext/first_form?corpname=parlamint21_es
featuredService.kontext British|https://www.clarin.si/kontext/first_form?corpname=parlamint21_gb
featuredService.kontext Croatian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_hr
featuredService.kontext Hungarian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_hu
featuredService.kontext Icelandic|https://www.clarin.si/kontext/first_form?corpname=parlamint21_is
featuredService.kontext Italian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_it
featuredService.kontext Lithuanian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_lt
featuredService.kontext Latvian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_lv
featuredService.kontext Dutch|https://www.clarin.si/kontext/first_form?corpname=parlamint21_nl
featuredService.kontext Polish|https://www.clarin.si/kontext/first_form?corpname=parlamint21_pl
featuredService.kontext Slovenian|https://www.clarin.si/kontext/first_form?corpname=parlamint21_si
featuredService.kontext Turkish|https://www.clarin.si/kontext/first_form?corpname=parlamint21_tr
featuredService.noske ParlaMint corpora|https://www.clarin.si/noske/parlamint21.cgi/
featuredService.noske Belgian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_be&struct_attr_stats=1
featuredService.noske Bulgarian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_bg&struct_attr_stats=1
featuredService.noske Czech|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_cz&struct_attr_stats=1
featuredService.noske Danish|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_dk&struct_attr_stats=1
featuredService.noske Spanish|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_es&struct_attr_stats=1
featuredService.noske British|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_gb&struct_attr_stats=1
featuredService.noske Croatian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_hr&struct_attr_stats=1
featuredService.noske Hungarian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_hu&struct_attr_stats=1
featuredService.noske Icelandic|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_is&struct_attr_stats=1
featuredService.noske Italian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_it&struct_attr_stats=1
featuredService.noske Lithuanian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_lt&struct_attr_stats=1
featuredService.noske Latvian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_lv&struct_attr_stats=1
featuredService.noske Dutch|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_nl&struct_attr_stats=1
featuredService.noske Polish|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_pl&struct_attr_stats=1
featuredService.noske Slovenian|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_si&struct_attr_stats=1
featuredService.noske Turkish|https://www.clarin.si/noske/run.cgi/corp_info?corpname=parlamint21_tr&struct_attr_stats=1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
ParlaMint-BE.ana.tgz
Size
1.48 GB
Format
Unknown
Description
Belgian corpus
MD5
b09fb2ebd4e60d7a8e6adf7f8ecf1f4d
 Download file
Icon
Name
ParlaMint-BG.ana.tgz
Size
1.08 GB
Format
Unknown
Description
Bulgarian corpus
MD5
5d85dd71bdc5e52f4205314129b144f6
 Download file
Icon
Name
ParlaMint-CZ.ana.tgz
Size
1.29 GB
Format
Unknown
Description
Czech corpus
MD5
98d530d3530f0c3630870d9f83b4e3bc
 Download file
Icon
Name
ParlaMint-DK.ana.tgz
Size
1.27 GB
Format
Unknown
Description
Danish corpus
MD5
e35904610860dcd9075a22b046f05ee4
 Download file
Icon
Name
ParlaMint-ES.ana.tgz
Size
595.01 MB
Format
Unknown
Description
Spanish corpus
MD5
5730b556cb1bf9fa4aa9eee730dd22e8
 Download file
Icon
Name
ParlaMint-FR.ana.tgz
Size
1.5 GB
Format
Unknown
Description
French corpus
MD5
ff96aaecad68812c51c8b7e44e4097d7
 Download file
Icon
Name
ParlaMint-GB.ana.tgz
Size
4.31 GB
Format
Unknown
Description
British corpus
MD5
a320f73e32b7ade02fa4473a07e2bdfc
 Download file
Icon
Name
ParlaMint-HR.ana.tgz
Size
1.02 GB
Format
Unknown
Description
Croatian corpus
MD5
d5a4c0e9f31a82769829f863cfbd4629
 Download file
Icon
Name
ParlaMint-HU.ana.tgz
Size
49.57 MB
Format
Unknown
Description
Hungarian corpus
MD5
f51930618ab70d20dea57e98f17af4c9
 Download file
Icon
Name
ParlaMint-IS.ana.tgz
Size
1.14 GB
Format
Unknown
Description
Icelandic corpus
MD5
bf5e3fa8341eaeea7ee2e8003e0c5817
 Download file
Icon
Name
ParlaMint-IT.ana.tgz
Size
1.31 GB
Format
Unknown
Description
Italian corpus
MD5
91def549bdb9d7d1cb5397b4267c1d4d
 Download file
Icon
Name
ParlaMint-LT.ana.tgz
Size
856.03 MB
Format
Unknown
Description
Lithuanian corpus
MD5
80414cb700bcca708059a5664d0dbf84
 Download file
Icon
Name
ParlaMint-LV.ana.tgz
Size
372.41 MB
Format
Unknown
Description
Latvian corpus
MD5
c347133927571a634d530fa7addec419
 Download file
Icon
Name
ParlaMint-NL.ana.tgz
Size
2.2 GB
Format
Unknown
Description
Dutch corpus
MD5
c7156d1fe239974eb315733b7e99b769
 Download file
Icon
Name
ParlaMint-PL.ana.tgz
Size
1.51 GB
Format
Unknown
Description
Polish corpus
MD5
181190206b281d82690886a7007d559b
 Download file
Icon
Name
ParlaMint-SI.ana.tgz
Size
1.08 GB
Format
Unknown
Description
Slovenian corpus
MD5
f869a82fbc9080b88729501510f482b3
 Download file
Icon
Name
ParlaMint-TR.ana.tgz
Size
2.34 GB
Format
Unknown
Description
Turkish corpus
MD5
f33a594f4656aecf325d82b47f31671c
 Download file
Icon
Name
ParlaMint-2.1.tgz
Size
4.72 MB
Format
Unknown
Description
https://github.com/clarin-eric/ParlaMint/releases/tag/v2.1 (samples, schemas, scripts)
MD5
32280fec61af1baff34bc4c84d31461b
 Download file

Show simple item record