dc.contributor.author | Shekhar, Ravi |
dc.contributor.author | Pollak, Senja |
dc.contributor.author | Pelicon, Andraž |
dc.contributor.author | Matthew, Purver |
dc.contributor.author | Krustok, Ivar |
dc.date.accessioned | 2021-05-24T09:21:16Z |
dc.date.available | 2021-05-24T09:21:16Z |
dc.date.issued | 2021-04-19 |
dc.identifier.uri | http://hdl.handle.net/11356/1401 |
dc.description | This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There are 11 CSV files: comments_2009.csv contains 2 898 438 comments from the year 2009 comments_2010.csv contains 2 377 591 comments from the year 2010 comments_2011.csv contains 2 729 389 comments from the year 2011 comments_2012.csv contains 3 372 776 comments from the year 2012 comments_2013.csv contains 3 289 393 comments from the year 2013 comments_2014.csv contains 3 195 502 comments from the year 2014 comments_2015.csv contains 3 202 592 comments from the year 2015 comments_2016.csv contains 2 848 624 comments from the year 2016 comments_2017.csv contains 2 838 075 comments from the year 2017 comments_2018.csv contains 3 194 597 comments from the year 2018 comments_2019.csv contains 1 526 755 comments from the year 2019 May In sum: 3 1473 732 comments Columns: comment_id (string) - the ID of the written comment article_id (string) - the ID of the article for which the comment was written created_time (string) - the time and date of the comment subject (string) - the title of the comment reply_to_comment_id (string) - the parent comments ID content (string) - the comment itself is_anonymous (string) - 1 if the comment was published anonymously 0 if the comment was published by a registered user is_enabled (string) - 1 if the comment was published (online) 0 if it wasn’t published Questionable field: not all have been manually moderated No additional information from the moderators channel_language (string) - the language of the channel: 'nat' for Estonian, 'rus' for Russian create_user_id (string) - the user ID of the commentator '0' for all blocked comments. moderated_by (string) - the ID of the moderator |
dc.language.iso | est |
dc.language.iso | rus |
dc.publisher | Ekspress Meedia Group |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/825153 |
dc.relation.isreferencedby | https://doi.org/10.21248/jlcl.34.2020.224 |
dc.relation.isreferencedby | https://www.aclweb.org/anthology/2021.hackashop-1.14.pdf |
dc.rights | Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://embeddia.eu/ |
dc.subject | news comments |
dc.subject | comment moderation |
dc.subject | offensive language |
dc.title | Ekspress user comment dataset 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Ravi Shekhar r.shekhar@qmul.ac.uk Queen Mary University |
contact.person | Matthew Purver m.purver@qmul.ac.uk Queen Mary University |
contact.person | Ivar Krustok ivar.krustok@ekspressmeedia.ee Ekspress Meedia |
sponsor | European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153 |
size.info | 31473732 texts |
files.count | 12 |
files.size | 10681774976 |
Files in this item
This item is
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)





- Name
- Readme.md
- Size
- 3.66 KB
- Format
- Unknown
- Description
- ReadMe
- MD5
- 054276a0ceab803663d8b31af1e5fb79

- Name
- comments_2013.csv
- Size
- 1.07 GB
- Format
- CSV file
- Description
- 2013 Data
- MD5
- 45a2adbb98b1e72e481215c635be42a6

- Name
- comments_2010.csv
- Size
- 786.24 MB
- Format
- CSV file
- Description
- 2010 Data
- MD5
- ddb9e6aa755002299de9496cfe77c0f1

- Name
- comments_2018.csv
- Size
- 951.93 MB
- Format
- CSV file
- Description
- 2018 Data
- MD5
- 8445cd07ba00cc3b745f2c5ff8eede60

- Name
- comments_2014.csv
- Size
- 1.07 GB
- Format
- CSV file
- Description
- 2014 Data
- MD5
- f72fd02a9b6780fa39cc0149c0593d34

- Name
- comments_2009.csv
- Size
- 946.31 MB
- Format
- CSV file
- Description
- 2009 Data
- MD5
- 021703c48abac14ccb3a5d1499895e72

- Name
- comments_2011.csv
- Size
- 933.16 MB
- Format
- CSV file
- Description
- 2011 Data
- MD5
- 72e245afeb43250cfe49d4a73edacab5

- Name
- comments_2012.csv
- Size
- 1.14 GB
- Format
- CSV file
- Description
- 2012 Data
- MD5
- 9a20a1347aff7deb832d51d65999e4c6

- Name
- comments_2015.csv
- Size
- 1013.47 MB
- Format
- CSV file
- Description
- 2015 Data
- MD5
- 62922a7761986094e4ff17e06f2fdd1e

- Name
- comments_2019.csv
- Size
- 408.71 MB
- Format
- CSV file
- Description
- 2019 Data
- MD5
- db5b94aaf9c55c135ef2137b6ec73b48

- Name
- comments_2016.csv
- Size
- 893.75 MB
- Format
- CSV file
- Description
- 2016 Data
- MD5
- 9d357afc1a19316f1e55a5623e7aca5d

- Name
- comments_2017.csv
- Size
- 892.98 MB
- Format
- CSV file
- Description
- 2017 Data
- MD5
- a6826db81e10699745228c62a7b94da6