Show simple item record

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.date.accessioned 2018-10-27T13:50:26Z
dc.date.available 2018-10-27T13:50:26Z
dc.date.issued 2018-10-27
dc.identifier.uri http://hdl.handle.net/11356/1201
dc.description FRENK-MMC-RTV is a dataset of moderated newspaper comments from the website rtvslo.si with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://drive.google.com/file/d/13m7PFn49_tnEfFjcbqk8cugG4ZTy2A5I/view
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://nl.ijs.si/frenk/
dc.subject computer-mediated communication
dc.subject news comments
dc.subject content moderation
dc.title Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
size.info 7597560 texts
size.info 325225576 words
files.count 2
files.size 4987598838


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
frenk-mmc.tbl.enc.zip
Size
1.07 GB
Format
application/zip
Description
TSV dataset with encrypted texts
MD5
74acc704b83272a640ad535917d1d52d
 Download file  Preview
 File Preview  
    • frenk-mmc.tbl.enc3 GB
    • frenk-mmc.tbl.enc.readme803 B
Icon
Name
frenk-mmc.tbl.model.zip
Size
3.58 GB
Format
application/zip
Description
fastText model
MD5
42c654dcea0094d39f6f48f045b9a656
 Download file  Preview
 File Preview  
    • frenk-mmc.tbl.model.bin3 GB
    • frenk-mmc.tbl.model.readme552 B

Show simple item record