Show simple item record

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.date.accessioned 2018-10-27T13:53:27Z
dc.date.available 2018-10-27T13:53:27Z
dc.date.issued 2018-10-27
dc.identifier.uri http://hdl.handle.net/11356/1202
dc.description FRENK-STYRIA-24sata is a dataset of moderated newspaper comments from the website 24sata.hr with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well.
dc.language.iso hrv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://drive.google.com/file/d/13m7PFn49_tnEfFjcbqk8cugG4ZTy2A5I/view
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/frenk/
dc.subject computer-mediated communication
dc.subject news comments
dc.subject content moderation
dc.title Dataset and baseline model of moderated content FRENK-STYRIA-24sata 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
size.info 17042965 texts
size.info 407549127 words
files.count 2
files.size 8186195223


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
frenk-sty.tbl.enc.zip
Size
1.45 GB
Format
application/zip
Description
TSV dataset with encrypted texts
MD5
aafb5a1e58790722bbf75bc50ea3f2dc
 Download file  Preview
 File Preview  
    • frenk-sty.tbl.enc5 GB
    • frenk-sty.tbl.enc.readme810 B
Icon
Name
frenk-sty.tbl.model.zip
Size
6.18 GB
Format
application/zip
Description
fastText model
MD5
a8e991ccbfa9444d97e5a2f3542d029f
 Download file  Preview
 File Preview  
    • frenk-sty.tbl.model.readme561 B
    • frenk-sty.tbl.model.bin6 GB

Show simple item record