dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2018-10-27T13:53:27Z |
dc.date.available | 2018-10-27T13:53:27Z |
dc.date.issued | 2018-10-27 |
dc.identifier.uri | http://hdl.handle.net/11356/1202 |
dc.description | FRENK-STYRIA-24sata is a dataset of moderated newspaper comments from the website 24sata.hr with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well. |
dc.language.iso | hrv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://drive.google.com/file/d/13m7PFn49_tnEfFjcbqk8cugG4ZTy2A5I/view |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://nl.ijs.si/frenk/ |
dc.subject | computer-mediated communication |
dc.subject | news comments |
dc.subject | content moderation |
dc.title | Dataset and baseline model of moderated content FRENK-STYRIA-24sata 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds |
size.info | 17042965 texts |
size.info | 407549127 words |
files.count | 2 |
files.size | 8186195223 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- frenk-sty.tbl.enc.zip
- Size
- 1.45 GB
- Format
- application/zip
- Description
- TSV dataset with encrypted texts
- MD5
- aafb5a1e58790722bbf75bc50ea3f2dc

- Name
- frenk-sty.tbl.model.zip
- Size
- 6.18 GB
- Format
- application/zip
- Description
- fastText model
- MD5
- a8e991ccbfa9444d97e5a2f3542d029f