Parliamentary ParlaCAP Dataset and CAP Topic Classifier
We are pleased to announce the release of the ParlaCAP dataset: an extension of the ParlaMint 5.0 collection enriched with sentiment and topic annotations, as well as extended metadata on parties and democracies.
The dataset contains around 8 million speeches from 28 European parliaments, and is provided in a tabular format, enhancing the usability of the ParlaMint corpora for social and political science research. As part of the OSCARS ParlaCAP project, the dataset was published through the Croatian CESSDA node CROSSDA, promoting thereby collaboration between infrastructures. We also released the multilingual topic classifier using the CAP (Comparative Agendas Project) labels, and tutorials for analysing ParlaCAP data in Python. More information is available here.