Show simple item record

 
dc.contributor.author Shekhar, Ravi
dc.contributor.author Pranjic, Marko
dc.contributor.author Pollak, Senja
dc.contributor.author Pelicon, Andraž
dc.contributor.author Purver, Matthew
dc.date.accessioned 2021-05-24T09:17:37Z
dc.date.available 2021-05-24T09:17:37Z
dc.date.issued 2021-04-19
dc.identifier.uri http://hdl.handle.net/11356/1399
dc.description The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation daily newspaper in Croatia, reaching on average 2 million readers daily. The dataset provides the comments metadata including the link to the relevant article, the ID of the comment author (anonymized), and timestamp. The comments are also labelled if they are blocked by human moderators. Description of the Datasets. The 24sata dataset consists of 11 columns and 21548192 rows. Each row represents one user comment on the 24sata news portal. Comments are added by registered users below the published news article. Columns: 'comment_id' - The internal id of the comment. Unique for each row. 'user_id' - The internal id of the user writing the comment. Unique for each user. '0' for all blocked comments. 'content' - The content (text) of the user comment. 'site' - The site the comment came from. 'reply_to_id' - The 'comment_id' of the parent comment - if this comment was intended as a reply. 'created_date' - The date the comment was created. 'last_change' - The date the comment was last edited. 'article_id' - A public id of the article where this comment was posted. The article itself can be accessed by appending article_id to the site. So an article with article_id 614684 and site 'www.24sata.hr' can be found on 'www.24sata.hr/a-614684'. (note the added 'a-' before the article name) 'infringed_on_rule' - If the user has infringed on rules with this comment, id of the rule is given. The description of the rules is given below. 'like_counts' - A number of times other users have voted in favour of this comment, similar to the Like button. 'dislike_counts' - A number of times other users have voted against this comment, opposite of the Like button.
dc.language.iso hrv
dc.publisher Styria Media Group
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby https://doi.org/10.21248/jlcl.34.2020.224
dc.relation.isreferencedby https://www.aclweb.org/anthology/2021.hackashop-1.14.pdf
dc.rights Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.label PUB
dc.source.uri http://embeddia.eu/
dc.subject news comments
dc.subject offensive language
dc.subject comment moderation
dc.subject croatian comment moderation
dc.title 24sata news comment dataset 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matthew Purver m.purver@qmul.ac.uk Queen Mary University
contact.person Ravi Shekhar r.shekhar@qmul.ac.uk Queen Mary University
contact.person Marko Pranjic marko@entropia.hr Styria Media Group
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
size.info 21548192 texts
files.count 3
files.size 2030974229


 Files in this item

Icon
Name
commenting-rules-24sata.md
Size
5.72 KB
Format
Unknown
Description
Commenting Rule
MD5
d35436d2d34bb12d6750f5a97b7016fc
 Download file
Icon
Name
README.md
Size
2.43 KB
Format
Unknown
Description
ReadMe
MD5
838c47e786a49eed3849728fb9d6e9c9
 Download file
Icon
Name
Styria-user-comments.zip
Size
1.89 GB
Format
application/zip
Description
24 Sata Comments
MD5
4722d9fa664299d85e6c485a5319fc54
 Download file  Preview
 File Preview  

Show simple item record