dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Kopp, Matyáš |
dc.contributor.author | Ogrodniczuk, Maciej |
dc.contributor.author | Osenova, Petya |
dc.contributor.author | Agirrezabal, Manex |
dc.contributor.author | Agnoloni, Tommaso |
dc.contributor.author | Aires, José |
dc.contributor.author | Albini, Monica |
dc.contributor.author | Alkorta, Jon |
dc.contributor.author | Antiba-Cartazo, Iván |
dc.contributor.author | Arrieta, Ekain |
dc.contributor.author | Barcala, Mario |
dc.contributor.author | Bardanca, Daniel |
dc.contributor.author | Barkarson, Starkaður |
dc.contributor.author | Bartolini, Roberto |
dc.contributor.author | Battistoni, Roberto |
dc.contributor.author | Bel, Nuria |
dc.contributor.author | Bonet Ramos, Maria del Mar |
dc.contributor.author | Calzada Pérez, María |
dc.contributor.author | Cardoso, Aida |
dc.contributor.author | Çöltekin, Çağrı |
dc.contributor.author | Coole, Matthew |
dc.contributor.author | Darģis, Roberts |
dc.contributor.author | de Libano, Ruben |
dc.contributor.author | Depoorter, Griet |
dc.contributor.author | Diwersy, Sascha |
dc.contributor.author | Dodé, Réka |
dc.contributor.author | Fernandez, Kike |
dc.contributor.author | Fernández Rei, Elisa |
dc.contributor.author | Frontini, Francesca |
dc.contributor.author | Garcia, Marcos |
dc.contributor.author | García Díaz, Noelia |
dc.contributor.author | García Louzao, Pedro |
dc.contributor.author | Gavriilidou, Maria |
dc.contributor.author | Gkoumas, Dimitris |
dc.contributor.author | Grigorov, Ilko |
dc.contributor.author | Grigorova, Vladislava |
dc.contributor.author | Haltrup Hansen, Dorte |
dc.contributor.author | Iruskieta, Mikel |
dc.contributor.author | Jarlbrink, Johan |
dc.contributor.author | Jelencsik-Mátyus, Kinga |
dc.contributor.author | Jongejan, Bart |
dc.contributor.author | Kahusk, Neeme |
dc.contributor.author | Kirnbauer, Martin |
dc.contributor.author | Kryvenko, Anna |
dc.contributor.author | Ligeti-Nagy, Noémi |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Luxardo, Giancarlo |
dc.contributor.author | Magariños, Carmen |
dc.contributor.author | Magnusson, Måns |
dc.contributor.author | Marchetti, Carlo |
dc.contributor.author | Marx, Maarten |
dc.contributor.author | Meden, Katja |
dc.contributor.author | Mendes, Amália |
dc.contributor.author | Mochtak, Michal |
dc.contributor.author | Mölder, Martin |
dc.contributor.author | Montemagni, Simonetta |
dc.contributor.author | Navarretta, Costanza |
dc.contributor.author | Nitoń, Bartłomiej |
dc.contributor.author | Norén, Fredrik Mohammadi |
dc.contributor.author | Nwadukwe, Amanda |
dc.contributor.author | Ojsteršek, Mihael |
dc.contributor.author | Pančur, Andrej |
dc.contributor.author | Papavassiliou, Vassilis |
dc.contributor.author | Pereira, Rui |
dc.contributor.author | Pérez Lago, María |
dc.contributor.author | Piperidis, Stelios |
dc.contributor.author | Pirker, Hannes |
dc.contributor.author | Pisani, Marilina |
dc.contributor.author | Pol, Henk van der |
dc.contributor.author | Prokopidis, Prokopis |
dc.contributor.author | Quochi, Valeria |
dc.contributor.author | Rayson, Paul |
dc.contributor.author | Regueira, Xosé Luís |
dc.contributor.author | Rii, Andriana |
dc.contributor.author | Rudolf, Michał |
dc.contributor.author | Ruisi, Manuela |
dc.contributor.author | Rupnik, Peter |
dc.contributor.author | Schopper, Daniel |
dc.contributor.author | Simov, Kiril |
dc.contributor.author | Sinikallio, Laura |
dc.contributor.author | Skubic, Jure |
dc.contributor.author | Tungland, Lars Magne |
dc.contributor.author | Tuominen, Jouni |
dc.contributor.author | van Heusden, Ruben |
dc.contributor.author | Varga, Zsófia |
dc.contributor.author | Vázquez Abuín, Marta |
dc.contributor.author | Venturi, Giulia |
dc.contributor.author | Vidal Miguéns, Adrián |
dc.contributor.author | Vider, Kadri |
dc.contributor.author | Vivel Couso, Ainhoa |
dc.contributor.author | Vladu, Adina Ioana |
dc.contributor.author | Wissik, Tanja |
dc.contributor.author | Yrjänäinen, Väinö |
dc.contributor.author | Zevallos, Rodolfo |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2024-06-04T18:45:24Z |
dc.date.available | 2024-06-04T18:45:24Z |
dc.date.issued | 2024-06-03 |
dc.identifier.uri | http://hdl.handle.net/11356/1912 |
dc.description | ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora comprise between 9 and 126 million words and the complete set contains over 1.2 billion words. The transcriptions are divided by days with information on the term, session and meeting, and contain speeches marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. The corpora have extensive metadata, most importantly on speakers (name, gender, MP and minister status, party affiliation), on their political parties and parliamentary groups (name, coalition/opposition status, Wikipedia-sourced left-to-right political orientation, and CHES variables, https://www.chesdata.eu/). Note that some corpora have further metadata, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The transcriptions are also marked with the subcorpora they belong to ("reference", until 2020-01-30, "covid", from 2020-01-31, and "war", from 2022-02-24). An overview of the statistics of the corpora is avaialable on GitHub in the folder Build/Metadata, in particular for the release 4.1 at https://github.com/clarin-eric/ParlaMint/tree/v4.1/Build/Metadata. The corpora are encoded according to the ParlaMint encoding guidelines (https://clarin-eric.github.io/ParlaMint/) and schemas (included in the distribution). This entry contains the ParlaMint TEI-encoded corpora and their derived plain text versions along with TSV metadata of the speeches. Also included is the 4.1 release of the sample data and scripts available at the GitHub repository of the ParlaMint project at https://github.com/clarin-eric/ParlaMint. Note that there also exists the linguistically marked-up version of the 4.1 ParlaMint corpus (http://hdl.handle.net/11356/1911) as well as a version machine translated to English (http://hdl.handle.net/11356/1910). Both are linked with CLARIN.SI concordancers for on-line analysis. As opposed to the previous version 4.0, this version fixes a number of bugs and restructures the ParlaMint GitHub repository. The DK corpus has now speeches also marked with topics. The PT corpus has been extended to 2024-03 and the UA corpus to 2023-11, where UA also has improved language marking (uk vs. ru) on segments. |
dc.language.iso | bul |
dc.language.iso | hrv |
dc.language.iso | pol |
dc.language.iso | slv |
dc.language.iso | ces |
dc.language.iso | isl |
dc.language.iso | fra |
dc.language.iso | nld |
dc.language.iso | dan |
dc.language.iso | spa |
dc.language.iso | tur |
dc.language.iso | eng |
dc.language.iso | ita |
dc.language.iso | hun |
dc.language.iso | lav |
dc.language.iso | bos |
dc.language.iso | cat |
dc.language.iso | deu |
dc.language.iso | ell |
dc.language.iso | est |
dc.language.iso | por |
dc.language.iso | srp |
dc.language.iso | swe |
dc.language.iso | ukr |
dc.language.iso | nor |
dc.language.iso | glg |
dc.language.iso | rus |
dc.language.iso | fin |
dc.language.iso | eus |
dc.publisher | CLARIN ERIC |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-024-09798-w |
dc.relation.replaces | http://hdl.handle.net/11356/1859 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.clarin.eu/content/parlamint |
dc.subject | parliamentary debates |
dc.subject | COVID-19 |
dc.subject | TEI |
dc.subject | Parla-CLARIN |
dc.subject | Czech Parliament |
dc.subject | Icelandic Parliament |
dc.subject | Belgian Parliament |
dc.subject | Danish Parliament |
dc.subject | Dutch Parliament |
dc.subject | Turkish Parliament |
dc.subject | Italian Parliament |
dc.subject | Hungarian Parliament |
dc.subject | Latvian Parliament |
dc.subject | Bulgarian Parliament |
dc.subject | Croatian Parliament |
dc.subject | Polish Parliament |
dc.subject | Slovenian Parliament |
dc.subject | French Parliament |
dc.subject | Austrian Parliament |
dc.subject | Bosnian Parliament |
dc.subject | Catalonian Parliament |
dc.subject | Galician Parliament |
dc.subject | Greek Parliament |
dc.subject | Norwegian Parliament |
dc.subject | Serbian Parliament |
dc.subject | Swedish Parliament |
dc.subject | Ukrainian Parliament |
dc.subject | Finnish Parliament |
dc.subject | Spanish Parliament |
dc.subject | Estonian Parliament |
dc.subject | Basque Parliament |
dc.subject | Portuguese Parliament |
dc.subject | UK Parliament |
dc.title | Multilingual comparable corpora of parliamentary debates ParlaMint 4.1 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://github.com/clarin-eric/ParlaMint/ |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
contact.person | Matyáš Kopp kopp@ufal.mff.cuni.cz Charles University |
sponsor | CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other |
sponsor | Austrian Academy of Sciences - ÖAW nationalFunds |
sponsor | European Commission POIR.04.02.00-00C002/19 European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure Other |
sponsor | Dutch Language Institute - - nationalFunds |
sponsor | Ministry of Education, Youth and Sports of the Czech Republic LM2023062 LINDAT/CLARIAH-CZ: Digital Research Infrastructure for Language Technologies, Arts and Humanities nationalFunds |
sponsor | Department of Nordic Studies and Linguistics (NorS), University of Copenhagen CLARIN-DK CLARIN-DK nationalFunds |
sponsor | Galician Language Institute, University of Santiago de Compostela - - ownFunds |
sponsor | Xunta de Galicia - University of Santiago de Compostela 2021-CP080 Nós: Galician in the society and economy of artificial intelligence (2021-CP080), agreement between Xunta de Galicia and the University of Santiago de Compostela nationalFunds |
sponsor | Hungarian Research Centre for Linguistics - - nationalFunds |
sponsor | National Library of Norway - - nationalFunds |
sponsor | Institute of Computer Science, Polish Academy of Sciences - statutory research nationalFunds |
sponsor | Polish Ministry of Education and Science 2022/WK/09 National contribution to CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure 2022–2023 (CLARIN Q) nationalFunds |
sponsor | Fundação para a Ciência e a Tecnologia UIDP/00214/2020 - nationalFunds |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | Nederlandse Organisatie voor Wetenschappelijk Onderwijs CISC.CC.016 Access to City Councils using Exploratory Search Systems nationalFunds |
sponsor | Bulgarian Ministry of Education and Science DO1-301/17.12.21 Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, part of the EU infrastructures CLARIN and DARIAH nationalFunds |
sponsor | Institute for Language and Speech Processing / ATHENA RC - - nationalFunds |
sponsor | ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds |
sponsor | The Árni Magnsússon Institute for Icelandic Studies - - ownFunds |
sponsor | Slovenian Research Agency (ARRS) P6-0436 Basic national research program 'Digital Humanities' (2022-2027) nationalFunds |
sponsor | ARRS (Slovenian Research Agency) N6-0099 Flemish-Slovenian bilateral basic research project ‘Linguistic landscape of hate speech online’ (2019-2023) nationalFunds |
sponsor | ARRS (Slovenian Research Agency) N6-0288 the MSCA Seal of Excellence postdoctoral project 'The Changing Discursive Semantics of EU Representations' (2022-2024) nationalFunds |
sponsor | Ministry of Science and Innovation of Spain - - nationalFunds |
sponsor | HiTZ - Ixa Group (UPV/EHU) - - Other |
size.info | 8073406 utterances |
size.info | 1231036093 words |
files.count | 30 |
files.size | 6305182537 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- ParlaMint-AT.tgz
- Size
- 315.2 MB
- Format
- Unknown
- Description
- Austrian corpus
- MD5
- e3bfc9b090cffe16cb6bfc36c9672acb

- Name
- ParlaMint-BA.tgz
- Size
- 83.05 MB
- Format
- Unknown
- Description
- Bosnian corpus
- MD5
- a131af9736febc035dbea81eea75f891

- Name
- ParlaMint-BE.tgz
- Size
- 218.82 MB
- Format
- Unknown
- Description
- Belgian corpus
- MD5
- 0dbb14391f4f852a5fe6a8f92e0eadc0

- Name
- ParlaMint-BG.tgz
- Size
- 143.25 MB
- Format
- Unknown
- Description
- Bulgarian corpus
- MD5
- 028d0f28acc6d8defb96d1b7c4097916

- Name
- ParlaMint-CZ.tgz
- Size
- 173.51 MB
- Format
- Unknown
- Description
- Czech corpus
- MD5
- 46df2114a02e7a0ee64937f85aa8326a

- Name
- ParlaMint-DK.tgz
- Size
- 162.22 MB
- Format
- Unknown
- Description
- Danish corpus
- MD5
- 36dbe59c3ea2f35c93d4c13326df0c68

- Name
- ParlaMint-EE.tgz
- Size
- 126.58 MB
- Format
- Unknown
- Description
- Estonian corpus
- MD5
- fa23afe7f2961775daf7d24e8819cd38

- Name
- ParlaMint-ES.tgz
- Size
- 83.6 MB
- Format
- Unknown
- Description
- Spanish Corpus
- MD5
- cb3cabe66670b21e3c1a3f38135629fd

- Name
- ParlaMint-ES-CT.tgz
- Size
- 66.8 MB
- Format
- Unknown
- Description
- Catalan corpus
- MD5
- 7df365180f9aec3c59d365b2cd8cbb98

- Name
- ParlaMint-ES-GA.tgz
- Size
- 76.88 MB
- Format
- Unknown
- Description
- Galician corpus
- MD5
- 72d4129052e5c9b1cef10d4ce3725608

- Name
- ParlaMint-ES-PV.tgz
- Size
- 61.32 MB
- Format
- Unknown
- Description
- Basque corpus
- MD5
- 4c8b1065ab22e81855b261d3fcb79d84

- Name
- ParlaMint-FI.tgz
- Size
- 111.42 MB
- Format
- Unknown
- Description
- Finnish corpus
- MD5
- 39e204e60276cf079a6f0693e89770fb

- Name
- ParlaMint-FR.tgz
- Size
- 226.83 MB
- Format
- Unknown
- Description
- French corpus
- MD5
- 34b026f9fc046634c87bd16fcb47a9e6

- Name
- ParlaMint-GB.tgz
- Size
- 516.51 MB
- Format
- Unknown
- Description
- British corpus
- MD5
- 62d557844e827030181b18b912bf171e

- Name
- ParlaMint-GR.tgz
- Size
- 310.23 MB
- Format
- Unknown
- Description
- Greek corpus
- MD5
- 8610d845afee472470d66276d3a041c2

- Name
- ParlaMint-HR.tgz
- Size
- 396.2 MB
- Format
- Unknown
- Description
- Croatian corpus
- MD5
- f8d8308fd03d9fda5bf9647fbff8f85e

- Name
- ParlaMint-HU.tgz
- Size
- 169.7 MB
- Format
- Unknown
- Description
- Hungarian corpus
- MD5
- fd2508739a9da655b600399854773e01

- Name
- ParlaMint-IS.tgz
- Size
- 138.47 MB
- Format
- Unknown
- Description
- Icelandic corpus
- MD5
- 7f57d3ec0eb79d17744a81a1286b1ee8

- Name
- ParlaMint-IT.tgz
- Size
- 145.38 MB
- Format
- Unknown
- Description
- Italian corpus
- MD5
- 1f7f27f1291790fa037626994b53b13d

- Name
- ParlaMint-LV.tgz
- Size
- 50.42 MB
- Format
- Unknown
- Description
- Latvian corpus
- MD5
- a62ca8abd0d644dc38da5f5396b693c2

- Name
- ParlaMint-NL.tgz
- Size
- 306.08 MB
- Format
- Unknown
- Description
- Dutch corpus
- MD5
- bec935658be48fd0fe172761cbf56fd8

- Name
- ParlaMint-NO.tgz
- Size
- 406.15 MB
- Format
- Unknown
- Description
- Norwegian corpus
- MD5
- 822cf6cb16f27f79c782e44d965c989f

- Name
- ParlaMint-PL.tgz
- Size
- 191.83 MB
- Format
- Unknown
- Description
- Polish corpus
- MD5
- 4c2b5fcb232c03b2a33d310f9c1abe3b

- Name
- ParlaMint-PT.tgz
- Size
- 111.17 MB
- Format
- Unknown
- Description
- Portuguese corpus
- MD5
- 0e0b305626849088f91a00434f6b52f1

- Name
- ParlaMint-RS.tgz
- Size
- 361.17 MB
- Format
- Unknown
- Description
- Serbian corpus
- MD5
- dcceb0f7a967911e0a7a6aea97b1a251

- Name
- ParlaMint-SE.tgz
- Size
- 147.08 MB
- Format
- Unknown
- Description
- Swedish corpus
- MD5
- c2943ca18b577872d2e0dc02145f79c6

- Name
- ParlaMint-SI.tgz
- Size
- 319.49 MB
- Format
- Unknown
- Description
- Slovenian corpus
- MD5
- d9b9b8ff2c1535662805048151afe124

- Name
- ParlaMint-TR.tgz
- Size
- 280.68 MB
- Format
- Unknown
- Description
- Turkish corpus
- MD5
- a24928ab6402b4d8a1322b67db2ce00b

- Name
- ParlaMint-UA.tgz
- Size
- 294.27 MB
- Format
- Unknown
- Description
- Ukrainian corpus
- MD5
- f39dcc20d8587cbc8d28d45222bdad88

- Name
- ParlaMint-4.1.tgz
- Size
- 18.77 MB
- Format
- Unknown
- Description
- https://github.com/clarin-eric/ParlaMint/releases/tag/v4.1 (samples, schemas, scripts)
- MD5
- 91929b37c965a5c6591b1cf2eda271ea