dc.contributor.author | Ivačič, Nikola |
dc.contributor.author | Pelicon, Andraž |
dc.contributor.author | Koloski, Boshko |
dc.contributor.author | Pollak, Senja |
dc.contributor.author | Purver, Matthew |
dc.date.accessioned | 2024-11-13T11:22:31Z |
dc.date.available | 2024-11-13T11:22:31Z |
dc.date.issued | 2024-11-13 |
dc.identifier.uri | http://hdl.handle.net/11356/1987 |
dc.description | We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages except Estonian, we include pairs of source URL (where corresponding text can be found) and sentiment label. For Estonian, we randomly sampled 100 articles from "Ekspress news article archive (in Estonian and Russian) 1.0" (http://hdl.handle.net/11356/1408). The data is organized in Tab-Separated Values (TSV) format. For Serbian, Bosnian, Macedonian, and Albanian, the dataset contains two columns: sourceURL and sentiment. For Estonian, the dataset consists of three columns: text ID (from the CLARIN.SI reference above), body text, and sentiment label. |
dc.language.iso | bos |
dc.language.iso | srp |
dc.language.iso | mkd |
dc.language.iso | sqi |
dc.language.iso | est |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://emma.ijs.si/ |
dc.subject | closely related languages |
dc.subject | sentiment analysis |
dc.subject | sentiment classification |
dc.title | News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Boshko Koloski boshko.koloski@ijs.si Jožef Stefan Institute |
contact.person | Nikola Ivačič nikola.ivacic@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) L2-50070 Embeddings-based techniques for Media Monitoring Applications nationalFunds |
size.info | 898 items |
files.count | 5 |
files.size | 356799 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (348.44 KB)To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Ime
- macedonia_SADEmma.tsv
- Velikost
- 27.04 KB
- Format
- Neznano
- Opis
- Dataset of 198 sentiment-annotated Macedonian news articles.
- MD5
- 51e80fd16e0433ad55897c815c806944

- Ime
- estonian_SADEmma.tsv
- Velikost
- 263.22 KB
- Format
- Neznano
- Opis
- Dataset of 100 sentiment-annotated Estonian news articles.
- MD5
- b5d19ec7df8ce2150712ed21958f6504

- Ime
- albanian_SADEmma.tsv
- Velikost
- 20.8 KB
- Format
- Neznano
- Opis
- Dataset of 200 sentiment-annotated Albanian news articles.
- MD5
- 64fa04afbf96d36d94078e5866de9048

- Ime
- serbian_SADEmma.tsv
- Velikost
- 18.17 KB
- Format
- Neznano
- Opis
- Dataset of 200 sentiment-annotated Serbian news articles.
- MD5
- 389614e7fb56b41613f1a47ced573676

- Ime
- bosnian_SADEmma.tsv
- Velikost
- 19.21 KB
- Format
- Neznano
- Opis
- Dataset of 200 sentiment-annotated Bosnian news articles.
- MD5
- 0b4bea7aeea9c0612c1f5defe7dcafc2