Prikaži enostavni zapis vnosa

 
dc.contributor.author Ivačič, Nikola
dc.contributor.author Pelicon, Andraž
dc.contributor.author Koloski, Boshko
dc.contributor.author Pollak, Senja
dc.contributor.author Purver, Matthew
dc.date.accessioned 2024-11-13T11:22:31Z
dc.date.available 2024-11-13T11:22:31Z
dc.date.issued 2024-11-13
dc.identifier.uri http://hdl.handle.net/11356/1987
dc.description We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages except Estonian, we include pairs of source URL (where corresponding text can be found) and sentiment label. For Estonian, we randomly sampled 100 articles from "Ekspress news article archive (in Estonian and Russian) 1.0" (http://hdl.handle.net/11356/1408). The data is organized in Tab-Separated Values (TSV) format. For Serbian, Bosnian, Macedonian, and Albanian, the dataset contains two columns: sourceURL and sentiment. For Estonian, the dataset consists of three columns: text ID (from the CLARIN.SI reference above), body text, and sentiment label.
dc.language.iso bos
dc.language.iso srp
dc.language.iso mkd
dc.language.iso sqi
dc.language.iso est
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://emma.ijs.si/
dc.subject closely related languages
dc.subject sentiment analysis
dc.subject sentiment classification
dc.title News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Boshko Koloski boshko.koloski@ijs.si Jožef Stefan Institute
contact.person Nikola Ivačič nikola.ivacic@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) L2-50070 Embeddings-based techniques for Media Monitoring Applications nationalFunds
size.info 898 items
files.count 5
files.size 356799


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (348.44 KB)
Icon
Ime
macedonia_SADEmma.tsv
Velikost
27.04 KB
Format
Neznano
Opis
Dataset of 198 sentiment-annotated Macedonian news articles.
MD5
51e80fd16e0433ad55897c815c806944
 Prenesi datoteko
Icon
Ime
estonian_SADEmma.tsv
Velikost
263.22 KB
Format
Neznano
Opis
Dataset of 100 sentiment-annotated Estonian news articles.
MD5
b5d19ec7df8ce2150712ed21958f6504
 Prenesi datoteko
Icon
Ime
albanian_SADEmma.tsv
Velikost
20.8 KB
Format
Neznano
Opis
Dataset of 200 sentiment-annotated Albanian news articles.
MD5
64fa04afbf96d36d94078e5866de9048
 Prenesi datoteko
Icon
Ime
serbian_SADEmma.tsv
Velikost
18.17 KB
Format
Neznano
Opis
Dataset of 200 sentiment-annotated Serbian news articles.
MD5
389614e7fb56b41613f1a47ced573676
 Prenesi datoteko
Icon
Ime
bosnian_SADEmma.tsv
Velikost
19.21 KB
Format
Neznano
Opis
Dataset of 200 sentiment-annotated Bosnian news articles.
MD5
0b4bea7aeea9c0612c1f5defe7dcafc2
 Prenesi datoteko

Prikaži enostavni zapis vnosa