dc.contributor.author | Shekhar, Ravi |
dc.contributor.author | Pranjic, Marko |
dc.contributor.author | Pollak, Senja |
dc.contributor.author | Pelicon, Andraž |
dc.contributor.author | Purver, Matthew |
dc.date.accessioned | 2021-05-24T09:17:37Z |
dc.date.available | 2021-05-24T09:17:37Z |
dc.date.issued | 2021-04-19 |
dc.identifier.uri | http://hdl.handle.net/11356/1399 |
dc.description | The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation daily newspaper in Croatia, reaching on average 2 million readers daily. The dataset provides the comments metadata including the link to the relevant article, the ID of the comment author (anonymized), and timestamp. The comments are also labelled if they are blocked by human moderators. Description of the Datasets. The 24sata dataset consists of 11 columns and 21548192 rows. Each row represents one user comment on the 24sata news portal. Comments are added by registered users below the published news article. Columns: 'comment_id' - The internal id of the comment. Unique for each row. 'user_id' - The internal id of the user writing the comment. Unique for each user. '0' for all blocked comments. 'content' - The content (text) of the user comment. 'site' - The site the comment came from. 'reply_to_id' - The 'comment_id' of the parent comment - if this comment was intended as a reply. 'created_date' - The date the comment was created. 'last_change' - The date the comment was last edited. 'article_id' - A public id of the article where this comment was posted. The article itself can be accessed by appending article_id to the site. So an article with article_id 614684 and site 'www.24sata.hr' can be found on 'www.24sata.hr/a-614684'. (note the added 'a-' before the article name) 'infringed_on_rule' - If the user has infringed on rules with this comment, id of the rule is given. The description of the rules is given below. 'like_counts' - A number of times other users have voted in favour of this comment, similar to the Like button. 'dislike_counts' - A number of times other users have voted against this comment, opposite of the Like button. |
dc.language.iso | hrv |
dc.publisher | Styria Media Group |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/825153 |
dc.relation.isreferencedby | https://doi.org/10.21248/jlcl.34.2020.224 |
dc.relation.isreferencedby | https://www.aclweb.org/anthology/2021.hackashop-1.14.pdf |
dc.rights | Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://embeddia.eu/ |
dc.subject | news comments |
dc.subject | offensive language |
dc.subject | comment moderation |
dc.title | 24sata news comment dataset 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Matthew Purver m.purver@qmul.ac.uk Queen Mary University |
contact.person | Ravi Shekhar r.shekhar@qmul.ac.uk Queen Mary University |
contact.person | Marko Pranjic marko@entropia.hr Styria Media Group |
sponsor | European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153 |
size.info | 21548192 texts |
files.count | 3 |
files.size | 2030974229 |
Files in this item
This item is
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)





- Name
- commenting-rules-24sata.md
- Size
- 5.72 KB
- Format
- Unknown
- Description
- Commenting Rule
- MD5
- d35436d2d34bb12d6750f5a97b7016fc

- Name
- README.md
- Size
- 2.43 KB
- Format
- Unknown
- Description
- ReadMe
- MD5
- 838c47e786a49eed3849728fb9d6e9c9

- Name
- Styria-user-comments.zip
- Size
- 1.89 GB
- Format
- application/zip
- Description
- 24 Sata Comments
- MD5
- 4722d9fa664299d85e6c485a5319fc54
- __MACOSX
- Styria-user-comments
- ._README.md-1 B
- ._.DS_Store-1 B
- ._commenting-rules-24sata.md-1 B
- Styria-user-comments
- Styria-user-comments
- README.md-1 B
- commenting-rules-24sata.md-1 B
- .DS_Store-1 B
- 24sata-user-comments.csv-1 B