Prikaži enostavni zapis vnosa

 
dc.contributor.author Kuzman, Taja
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Kopp, Matyáš
dc.contributor.author Ogrodniczuk, Maciej
dc.contributor.author Osenova, Petya
dc.contributor.author Rayson, Paul
dc.contributor.author Vidler, John
dc.contributor.author Agerri, Rodrigo
dc.contributor.author Agirrezabal, Manex
dc.contributor.author Agnoloni, Tommaso
dc.contributor.author Aires, José
dc.contributor.author Albini, Monica
dc.contributor.author Alkorta, Jon
dc.contributor.author Antiba-Cartazo, Iván
dc.contributor.author Arrieta, Ekain
dc.contributor.author Barcala, Mario
dc.contributor.author Bardanca, Daniel
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Bartolini, Roberto
dc.contributor.author Battistoni, Roberto
dc.contributor.author Bel, Nuria
dc.contributor.author Bonet Ramos, Maria del Mar
dc.contributor.author Calzada Pérez, María
dc.contributor.author Cardoso, Aida
dc.contributor.author Çöltekin, Çağrı
dc.contributor.author Coole, Matthew
dc.contributor.author Darģis, Roberts
dc.contributor.author de Does, Jesse
dc.contributor.author de Libano, Ruben
dc.contributor.author Depoorter, Griet
dc.contributor.author Depuydt, Katrien
dc.contributor.author Diwersy, Sascha
dc.contributor.author Dodé, Réka
dc.contributor.author Fernandez, Kike
dc.contributor.author Fernández Rei, Elisa
dc.contributor.author Frontini, Francesca
dc.contributor.author Garcia, Marcos
dc.contributor.author García Díaz, Noelia
dc.contributor.author García Louzao, Pedro
dc.contributor.author Gavriilidou, Maria
dc.contributor.author Gkoumas, Dimitris
dc.contributor.author Grigorov, Ilko
dc.contributor.author Grigorova, Vladislava
dc.contributor.author Haltrup Hansen, Dorte
dc.contributor.author Iruskieta, Mikel
dc.contributor.author Jarlbrink, Johan
dc.contributor.author Jelencsik-Mátyus, Kinga
dc.contributor.author Jongejan, Bart
dc.contributor.author Kahusk, Neeme
dc.contributor.author Kirnbauer, Martin
dc.contributor.author Kryvenko, Anna
dc.contributor.author Ligeti-Nagy, Noémi
dc.contributor.author Luxardo, Giancarlo
dc.contributor.author Magariños, Carmen
dc.contributor.author Magnusson, Måns
dc.contributor.author Marchetti, Carlo
dc.contributor.author Marx, Maarten
dc.contributor.author Meden, Katja
dc.contributor.author Mendes, Amália
dc.contributor.author Mochtak, Michal
dc.contributor.author Mölder, Martin
dc.contributor.author Montemagni, Simonetta
dc.contributor.author Navarretta, Costanza
dc.contributor.author Nitoń, Bartłomiej
dc.contributor.author Norén, Fredrik Mohammadi
dc.contributor.author Nwadukwe, Amanda
dc.contributor.author Ojsteršek, Mihael
dc.contributor.author Pančur, Andrej
dc.contributor.author Papavassiliou, Vassilis
dc.contributor.author Pereira, Rui
dc.contributor.author Pérez Lago, María
dc.contributor.author Piperidis, Stelios
dc.contributor.author Pirker, Hannes
dc.contributor.author Pisani, Marilina
dc.contributor.author Pol, Henk van der
dc.contributor.author Prokopidis, Prokopis
dc.contributor.author Quochi, Valeria
dc.contributor.author Regueira, Xosé Luís
dc.contributor.author Rii, Andriana
dc.contributor.author Rudolf, Michał
dc.contributor.author Ruisi, Manuela
dc.contributor.author Rupnik, Peter
dc.contributor.author Schopper, Daniel
dc.contributor.author Simov, Kiril
dc.contributor.author Sinikallio, Laura
dc.contributor.author Skubic, Jure
dc.contributor.author Tamper, Minna
dc.contributor.author Tungland, Lars Magne
dc.contributor.author Tuominen, Jouni
dc.contributor.author van Heusden, Ruben
dc.contributor.author Varga, Zsófia
dc.contributor.author Vázquez Abuín, Marta
dc.contributor.author Venturi, Giulia
dc.contributor.author Vidal Miguéns, Adrián
dc.contributor.author Vider, Kadri
dc.contributor.author Vivel Couso, Ainhoa
dc.contributor.author Vladu, Adina Ioana
dc.contributor.author Wissik, Tanja
dc.contributor.author Yrjänäinen, Väinö
dc.contributor.author Zevallos, Rodolfo
dc.contributor.author Fišer, Darja
dc.date.accessioned 2024-06-04T18:48:58Z
dc.date.available 2024-06-04T18:48:58Z
dc.date.issued 2024-06-03
dc.identifier.uri http://hdl.handle.net/11356/1910
dc.description ParlaMint-en.ana 4.1 is the English machine translation of the ParlaMint.ana 4.1 (http://hdl.handle.net/11356/1911) set of corpora of parliamentary debates across Europe. The translation is linguistically annotated similarly to the original language corpora (but without UD syntax), and with the addition of USAS semantic tags (https://ucrel.lancs.ac.uk/usas/). Because of the addition of semantic tags the UK corpus (ParlaMint-GB) is also included. The translation to English was done with EasyNMT (https://github.com/UKPLab/EasyNMT) using OPUS-MT models (https://github.com/Helsinki-NLP/Opus-MT). Machine translation was done on the sentence level, and includes both speeches and transcriber notes, including headings. Note that corpus metadata is mostly available both in the source language and in English. The linguistic annotation of the speeches, i.e. tokenisation, tagging with UD PoS and morphological features, lemmatisation, and NER annotation was done with Stanza (https://stanfordnlp.github.io/stanza/) using the conll03 model (4 classes). The annotation of MWEs (phrases) and tokens with USAS tags was done with pyMusas (https://github.com/ucrel/pymusas). Note that the English in the corpora contains typical NMT errors, including factual errors even when high fluency is achieved, and any use of this corpus should take the machine translation limitations into account. The files associated with this entry include the machine translated and linguistically annotated corpora in several formats: the corpora in the canonical ParlaMint TEI XML encoding; the corpora in the derived vertical format (for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText); and the corpora in the CoNLL-U format with TSV speech metadata. The CoNLL-U files include pyMusas USAS tags. Also included is the 4.1 release of the sample data and scripts available at the GitHub repository of the ParlaMint project at https://github.com/clarin-eric/ParlaMint and the log files produced in the process of building the corpora for this release. The log files show e.g. known errors in the corpora, while more information about known problems is available in the (open) issues at the GitHub repository of the project. As opposed to the previous version 4.0, this version fixes a number of bugs and restructures the ParlaMint GitHub repository. The DK corpus has now speeches also marked with topics. The PT corpus has been extended to 2024-03 and the UA corpus to 2023-11, where UA also has improved language marking (uk vs. ru) on segments.
dc.language.iso eng
dc.publisher CLARIN ERIC
dc.relation.isreferencedby https://doi.org/10.1007/s10579-024-09798-w
dc.relation.replaces http://hdl.handle.net/11356/1864
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/content/parlamint
dc.subject Parla-CLARIN
dc.subject parliamentary debates
dc.subject COVID-19
dc.subject TEI
dc.subject Bulgarian Parliament
dc.subject Croatian Parliament
dc.subject Polish Parliament
dc.subject Slovenian Parliament
dc.subject Czech Parliament
dc.subject Icelandic Parliament
dc.subject Belgian Parliament
dc.subject Danish Parliament
dc.subject Dutch Parliament
dc.subject Turkish Parliament
dc.subject Italian Parliament
dc.subject Hungarian Parliament
dc.subject Latvian Parliament
dc.subject French Parliament
dc.subject Bosnian Parliament
dc.subject Catalonian Parliament
dc.subject Galician Parliament
dc.subject Greek Parliament
dc.subject Norwegian Parliament
dc.subject Portugese Parliament
dc.subject Serbian Parliament
dc.subject Swedish Parliament
dc.subject Ukrainian Parliament
dc.subject Austrian Parliament
dc.subject Estonian Parliament
dc.subject Spanish Parliament
dc.subject Finnish Parliament
dc.subject Basque Parliament
dc.subject British Parliament
dc.title Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 4.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/clarin-eric/ParlaMint/
contact.person Matyáš Kopp kopp@ufal.mff.cuni.cz Charles University
contact.person Taja Kuzman taja.kuzman@ijs.si Jožef Stefan Institute
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor Austrian Academy of Sciences - ÖAW nationalFunds
sponsor European Commission POIR.04.02.00-00C002/19 European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure Other
sponsor Dutch Language Institute - - nationalFunds
sponsor Ministry of Education, Youth and Sports of the Czech Republic LM2023062 LINDAT/CLARIAH-CZ: Digital Research Infrastructure for Language Technologies, Arts and Humanities nationalFunds
sponsor Department of Nordic Studies and Linguistics (NorS), University of Copenhagen CLARIN-DK CLARIN-DK nationalFunds
sponsor Galician Language Institute, University of Santiago de Compostela - - ownFunds
sponsor Xunta de Galicia - University of Santiago de Compostela 2021-CP080 Nós: Galician in the society and economy of artificial intelligence (2021-CP080), agreement between Xunta de Galicia and the University of Santiago de Compostela nationalFunds
sponsor Hungarian Research Centre for Linguistics - - nationalFunds
sponsor National Library of Norway - - nationalFunds
sponsor Institute of Computer Science, Polish Academy of Sciences - statutory research nationalFunds
sponsor Polish Ministry of Education and Science 2022/WK/09 National contribution to CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure 2022–2023 (CLARIN Q) nationalFunds
sponsor Fundação para a Ciência e a Tecnologia UIDP/00214/2020 - nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Nederlandse Organisatie voor Wetenschappelijk Onderwijs CISC.CC.016 Access to City Councils using Exploratory Search Systems nationalFunds
sponsor Bulgarian Ministry of Education and Science DO1-301/17.12.21 Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, part of the EU infrastructures CLARIN and DARIAH nationalFunds
sponsor Institute for Language and Speech Processing / ATHENA RC - - nationalFunds
sponsor ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds
sponsor The Árni Magnsússon Institute for Icelandic Studies - - ownFunds
sponsor Slovenian Research Agency (ARRS) P6-0436 Basic national research program 'Digital Humanities' (2022-2027) nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0099 Flemish-Slovenian bilateral basic research project ‘Linguistic landscape of hate speech online’ (2019-2023) nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0288 the MSCA Seal of Excellence postdoctoral project 'The Changing Discursive Semantics of EU Representations' (2022-2024) nationalFunds
sponsor Ministry of Science and Innovation of Spain - - nationalFunds
sponsor HiTZ - Ixa Group (UPV/EHU) - - Other
size.info 8132022 utterances
size.info 1364870493 words
files.count 31
files.size 57298306705
featuredService.noske search MTed corpora|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_xx_en
featuredService.noske search original corpora|https://www.clarin.si/ske/#dashboard?corpname=parlamint40_xx


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
ParlaMint-AT-en.ana.tgz
Velikost
2.67 GB
Format
Neznano
Opis
Austrian corpus
MD5
a58f626ca2cba043052f34a5daea74f7
 Prenesi datoteko
Icon
Ime
ParlaMint-BA-en.ana.tgz
Velikost
826.67 MB
Format
Neznano
Opis
Bosnian corpus
MD5
8b9d20dafe102c89800f7e3a5aadb2bb
 Prenesi datoteko
Icon
Ime
ParlaMint-BE-en.ana.tgz
Velikost
1.7 GB
Format
Neznano
Opis
Belgian corpus
MD5
b89acc55cc8703790177d4e1147b6fc1
 Prenesi datoteko
Icon
Ime
ParlaMint-BG-en.ana.tgz
Velikost
1.12 GB
Format
Neznano
Opis
Bulgarian corpus
MD5
ed2a385aee91e2d0e21fef76d426e237
 Prenesi datoteko
Icon
Ime
ParlaMint-CZ-en.ana.tgz
Velikost
1.47 GB
Format
Neznano
Opis
Czech corpus
MD5
531d048ba755a1e5d8c4e615982722a5
 Prenesi datoteko
Icon
Ime
ParlaMint-DK-en.ana.tgz
Velikost
1.6 GB
Format
Neznano
Opis
Danish corpus
MD5
68b4e212677a2aedca775ff6a5b7a123
 Prenesi datoteko
Icon
Ime
ParlaMint-EE-en.ana.tgz
Velikost
1.22 GB
Format
Neznano
Opis
Estonian corpus
MD5
07e5873214aa745f849c4e714cd73679
 Prenesi datoteko
Icon
Ime
ParlaMint-ES-en.ana.tgz
Velikost
784.8 MB
Format
Neznano
Opis
Spanish corpus
MD5
935a2a93a2e8ccabc24e8ece5a700295
 Prenesi datoteko
Icon
Ime
ParlaMint-ES-CT-en.ana.tgz
Velikost
634.32 MB
Format
Neznano
Opis
Catalan corpus
MD5
c5d8a52b6a98c6233317cfed9b3004d7
 Prenesi datoteko
Icon
Ime
ParlaMint-ES-GA-en.ana.tgz
Velikost
747.34 MB
Format
Neznano
Opis
Galician corpus
MD5
2ccfba306f8078867286334a90ad1119
 Prenesi datoteko
Icon
Ime
ParlaMint-ES-PV-en.ana.tgz
Velikost
563.72 MB
Format
Neznano
Opis
Basque corpus
MD5
2d2a6f0a4a2a82a1b9678ce348c08ac7
 Prenesi datoteko
Icon
Ime
ParlaMint-FI-en.ana.tgz
Velikost
790.64 MB
Format
Neznano
Opis
Finnish corpus
MD5
4d69333bad7bde7f4647e468c2635b75
 Prenesi datoteko
Icon
Ime
ParlaMint-FR-en.ana.tgz
Velikost
1.88 GB
Format
Neznano
Opis
French corpus
MD5
46abf342dd8133eb6943ce44c9ef8fbf
 Prenesi datoteko
Icon
Ime
ParlaMint-GB-en.ana.tgz
Velikost
4.86 GB
Format
Neznano
Opis
British corpus
MD5
757886ea8fd220c473a238de301f4171
 Prenesi datoteko
Icon
Ime
ParlaMint-GR-en.ana.tgz
Velikost
2.14 GB
Format
Neznano
Opis
Greek corpus
MD5
7609ed5e86552b11f4fb1a2d87b5c540
 Prenesi datoteko
Icon
Ime
ParlaMint-HR-en.ana.tgz
Velikost
3.86 GB
Format
Neznano
Opis
Croatian corpus
MD5
8146dcd9228fd6154493cfd751727afd
 Prenesi datoteko
Icon
Ime
ParlaMint-HU-en.ana.tgz
Velikost
1.49 GB
Format
Neznano
Opis
Hungarian corpus
MD5
4cc90667be72bd56bc85135c68d7cb78
 Prenesi datoteko
Icon
Ime
ParlaMint-IS-en.ana.tgz
Velikost
1.27 GB
Format
Neznano
Opis
Icelandic corpus
MD5
57d899285eb992b6cf1da26b5f90275c
 Prenesi datoteko
Icon
Ime
ParlaMint-IT-en.ana.tgz
Velikost
1.33 GB
Format
Neznano
Opis
Italian corpus
MD5
60b5c7b51d3a0ff78d3788ee9051d2e9
 Prenesi datoteko
Icon
Ime
ParlaMint-LV-en.ana.tgz
Velikost
491.39 MB
Format
Neznano
Opis
Latvian corpus
MD5
5f025014b84cbc77b39c24fa5691146a
 Prenesi datoteko
Icon
Ime
ParlaMint-NL-en.ana.tgz
Velikost
2.65 GB
Format
Neznano
Opis
Dutch corpus
MD5
46e88465bcbed4d17d693164565f4522
 Prenesi datoteko
Icon
Ime
ParlaMint-NO-en.ana.tgz
Velikost
3.84 GB
Format
Neznano
Opis
Norwegian corpus
MD5
8cbca2c1259add2689066344cc96b00b
 Prenesi datoteko
Icon
Ime
ParlaMint-PL-en.ana.tgz
Velikost
1.68 GB
Format
Neznano
Opis
Polish corpus
MD5
2f4889dd844f0605041bff0ec05f757f
 Prenesi datoteko
Icon
Ime
ParlaMint-PT-en.ana.tgz
Velikost
981.91 MB
Format
Neznano
Opis
Portuguese corpus
MD5
ec0aa88a221fb508b26930cb4e4cb5c3
 Prenesi datoteko
Icon
Ime
ParlaMint-RS-en.ana.tgz
Velikost
3.63 GB
Format
Neznano
Opis
Serbian corpus
MD5
8337bb60e6c6e8335b9669a2a58a4a4b
 Prenesi datoteko
Icon
Ime
ParlaMint-SE-en.ana.tgz
Velikost
1.36 GB
Format
Neznano
Opis
Swedish corpus
MD5
3fb4a80780a806f8f4baaf900db332d7
 Prenesi datoteko
Icon
Ime
ParlaMint-SI-en.ana.tgz
Velikost
3.19 GB
Format
Neznano
Opis
Slovenian corpus
MD5
8c3180a55187659fbd75245a08265c7d
 Prenesi datoteko
Icon
Ime
ParlaMint-TR-en.ana.tgz
Velikost
2.54 GB
Format
Neznano
Opis
Turkish corpus
MD5
1a3b308c7d228ff7410e7cdf7483cdd6
 Prenesi datoteko
Icon
Ime
ParlaMint-UA-en.ana.tgz
Velikost
2.13 GB
Format
Neznano
Opis
Ukrainian corpus
MD5
b0657d56e1441c2aa1e454c204be2efa
 Prenesi datoteko
Icon
Ime
ParlaMint-4.1.tgz
Velikost
18.77 MB
Format
Neznano
Opis
https://github.com/clarin-eric/ParlaMint/releases/tag/v4.1 (samples, schemas, scripts)
MD5
91929b37c965a5c6591b1cf2eda271ea
 Prenesi datoteko
Icon
Ime
ParlaMint-4.1-Logs.tgz
Velikost
23.36 MB
Format
Neznano
Opis
Build log files of the corpora
MD5
4c2f2b7d5394eceab9f7dbf5a217b55a
 Prenesi datoteko

Prikaži enostavni zapis vnosa