dc.contributor.author | Mochtak, Michal |
dc.contributor.author | Rupnik, Peter |
dc.contributor.author | Meden, Katja |
dc.contributor.author | Ljubešić, Nikola |
dc.date.accessioned | 2023-09-19T16:19:23Z |
dc.date.available | 2023-09-19T16:19:23Z |
dc.date.issued | 2023-09-18 |
dc.identifier.uri | http://hdl.handle.net/11356/1868 |
dc.description | The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema (defined below). The data coming from the parliaments of Bosnia and Herzegovina, Croatia and Serbia are organised as a single parliament group, named "BCS", due to the similarity of the official languages in these countries. For each of the six parliaments / parliament groups, 2,600 training instances were annotated by two annotators, with one additional conflict resolution step. While these training instances were sampled via sentiment lexicons to contain more sentiment-loaded sentences, two test sets were randomly sampled from selected parliaments, one from the BCS parliament group, another from the parliament of the United Kingdom. Each test set consists of 2,600 sentences, annotated by one highly trained annotator. Training datasets were internally split into "train", "dev" and "test" portions" for performing language-specific experiments. The 6-level annotation schema is the following: - Positive for sentences that are entirely or predominantly positive - Negative for sentences that are entirely or predominantly negative - M_Positive for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the positive sentiment - M_Negative for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the negative sentiment - P_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the positive sentiment - N_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the negative sentiment |
dc.language.iso | bos |
dc.language.iso | hrv |
dc.language.iso | ces |
dc.language.iso | eng |
dc.language.iso | srp |
dc.language.iso | slk |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | http://arxiv.org/abs/2309.09783 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.clarin.eu/parlamint |
dc.subject | sentiment classification |
dc.subject | sentiment analysis |
dc.subject | parliamentary debates |
dc.subject | Bosnian Parliament |
dc.subject | Croatian Parliament |
dc.subject | Czech Parliament |
dc.subject | English Parliament |
dc.subject | Serbian Parliament |
dc.subject | Slovak Parliament |
dc.subject | Slovenian Parliament |
dc.title | The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://huggingface.co/classla/xlm-r-parlasent |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
sponsor | CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds |
sponsor | ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds |
size.info | 18200 sentences |
size.info | 7 files |
files.count | 8 |
files.size | 7793411 |
Files in this item
Download all files in item (7.43 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- ParlaSent_BCS.jsonl
- Size
- 1.13 MB
- Format
- Unknown
- Description
- BCS train file
- MD5
- c8b59c84c476b031cc553bc3c768e627

- Name
- ParlaSent_CZ.jsonl
- Size
- 1.15 MB
- Format
- Unknown
- Description
- Czech train file
- MD5
- ff633c11f3d0e1e8fc544db0732e8104

- Name
- ParlaSent_EN.jsonl
- Size
- 1.1 MB
- Format
- Unknown
- Description
- English train file
- MD5
- 9c011abd994c14dc53afb37013fdac05

- Name
- ParlaSent_SK.jsonl
- Size
- 1.13 MB
- Format
- Unknown
- Description
- Slovak train file
- MD5
- 2e2944d8edaa2021b361e3ec3d23a5ee

- Name
- ParlaSent_BCS_test.jsonl
- Size
- 948.03 KB
- Format
- Unknown
- Description
- BCS test file
- MD5
- ee8699a4a7b1a834f79fe74b8ebdfaf1

- Name
- ParlaSent_EN_test.jsonl
- Size
- 940.29 KB
- Format
- Unknown
- Description
- English test file
- MD5
- 003f0aeded7001574e79c49b09401e83

- Name
- ParlaSent_SL.jsonl
- Size
- 1.07 MB
- Format
- Unknown
- Description
- Slovenian train file
- MD5
- 1117ec542bd1812681a2fff7f0eae1e2

- Name
- README.txt
- Size
- 2.15 KB
- Format
- Text file
- Description
- README with attribute descriptions
- MD5
- 583856c8d470334e5638f6a078f727d5
The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0 http://hdl.handle.net/11356/1868 The dataset consists of five training datasets and two test sets. The test sets have a _test.jsonl suffix. The attributes in training data are the following: - sentence - the sentence labeled for sentiment - country - the country of the parliament the sentence comes form - annotator1 - first annotator's annotation - annotator2 - second annotator's annotation - reconciliation - the final label agreed upon after reconciliation - label - three level (positive, negative, neutral) label based on the reconciliation label - document_id - internal identifier of the document the sentence comes form - sentence_id - internal identifier of the sentence inside the document - term - the term of the parliament the sentence comes from - date - the date the sentence was uttered as part of a speech in the parliament - name - name of the MP giving the speech - party - the party of the MP - gender . . .