dc.contributor.author | Mochtak, Michal |
dc.contributor.author | Rupnik, Peter |
dc.contributor.author | Ljubešić, Nikola |
dc.date.accessioned | 2022-06-08T08:33:44Z |
dc.date.available | 2022-06-08T08:33:44Z |
dc.date.issued | 2022-06-08 |
dc.identifier.uri | http://hdl.handle.net/11356/1585 |
dc.description | The dataset consists of mid-length sentences from the Bosnian, Croatian and Serbian parliamentary proceedings, annotated with a 6-level sentiment schema (defined below). The first 1,300 instances were annotated by two annotators, and a reconciliation procedure was performed if there was disagreement on the simplified 3-level schema (Positive, Negative, Neutral). The latter 1,300 instances were annotated by second annotator only. Besides having the annotations of the two annotators and potential reconciliation annotations, there is also a handy 3-level label available for all instances. Each sentence can be followed back to the original datasets (https://doi.org/10.5281/zenodo.6517697, https://doi.org/10.5281/zenodo.6521372, https://doi.org/10.5281/zenodo.6521648) via a document and sentence identifier. Date of the speech and the speaker name are given as well. If the speaker is MP, information on party, gender and year of birth are available as well. The dataset is split into a training (2,150 instances), development (150 instances) and testing subset (300 instances). The full 6-level annotation schema is the following: - Positive for sentences that are entirely or predominantly positive - Negative for sentences that are entirely or predominantly negative - M_Positive for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the positive sentiment in a strict binary classification - M_Negative for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the negative sentiment in a strict binary classification - P_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the positive sentiment in a strict binary classification - N_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the negative sentiment in a strict binary classification |
dc.language.iso | bos |
dc.language.iso | hrv |
dc.language.iso | srp |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.clarin.eu/parlamint |
dc.subject | sentiment classification |
dc.subject | parliamentary debates |
dc.title | The sentiment corpus of parliamentary debates ParlaSent-BCS v1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://huggingface.co/classla/bcms-bertic-parlasent-bcs-ter |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
sponsor | CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds |
size.info | 2600 sentences |
files.count | 1 |
files.size | 1187107 |
Datoteke v tem vnosu
To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Ime
- ParlaSent-BCS.jsonl
- Velikost
- 1.13 MB
- Format
- Neznano
- Opis
- JSONL dataset
- MD5
- 8617eac2b69bf9198e6566b379d80833