Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije Common Language Resources and Technology Infrastructure, Slovenia

Parliamentary ParlaCAP Dataset and CAP Topic Classifier

Katja Meden 2025-10-21

We are pleased to announce the release of the ParlaCAP dataset: an extension of the ParlaMint 5.0 collection enriched with sentiment and topic annotations, as well as extended metadata on parties and democracies.

The dataset contains around 8 million speeches from 28 European parliaments, and is provided in a tabular format, enhancing the usability of the ParlaMint corpora for social and political science research. As part of the OSCARS ParlaCAP project, the dataset was published through the Croatian CESSDA node CROSSDA, promoting thereby collaboration between infrastructures. We also released the multilingual topic classifier using the CAP (Comparative Agendas Project) labels, and tutorials for analysing ParlaCAP data in Python. More information is available here.

What’s new:

CLARIN.SI Repository: Planned Downtime (3. – 5. 4. 2026)
We would like to inform you that, due to a scheduled power outage and maintenance, the CLARIN.SI Repository will be temporarily unavailable from Friday, 3 April to Sunday, 5 April 2026.

We apologise for any inconvenience and thank you for your understanding and patience.
Call for Papers: JT-DH 2026
We are pleased to invite you to contribute to JT-DH 2026, the biennial Conference on Language Technologies and Digital Humanities, which will take place on 17–18 September 2026 at the Faculty of Computer and Information Science, University of Ljubljana (Slovenia).

Submission deadline: 24 April 2026

(more…)

Parliamentary ParlaCAP Dataset and CAP Topic Classifier

Related Posts

CLARIN.SI Repository: Planned Downtime (3. – 5. 4. 2026)

Call for Papers: JT-DH 2026

CLARIN.SI 2025 – A Year in Numbers