Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Fišer, Darja
dc.date.accessioned 2017-08-31T07:13:40Z
dc.date.available 2017-08-31T07:13:40Z
dc.date.issued 2017-08-17
dc.identifier.uri http://hdl.handle.net/11356/1138
dc.description Janes-Blog is an annotated corpus of Slovene blogs from websites rtvslo.si and publishwall.si from the period 2006-10 to 2016-01. The corpus is structured into individual texts containing the post of the blog and comments on the post, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy, usernames are not included in the metadata and 'person' as well as 'person derivative' named entities have been removed from the texts.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://doi.org/10.4312/slo2.0.2016.2.67-99
dc.relation.isreferencedby https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Blog
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject blogs
dc.subject word normalisation
dc.subject named entities
dc.subject TEI
dc.title Blog post and comment corpus Janes-Blog 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
contact.person Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 404281 texts
size.info 34534431 tokens
files.count 2
files.size 431290698
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=janes_blog
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=janes_blog


 Files in this item

 Download all files in item (411.31 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
Janes-Blog.TEI.zip
Size
212.66 MB
Format
application/zip
Description
Corpus in TEI format
MD5
3283531fc545a3c9a33855bf170354e2
 Download file  Preview
 File Preview  
  • Janes-Blog.TEI
    • janes.blog.back.xml465 kB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.blog.xml12 kB
    • janes.blog.body.xml1 GB
    • 00README.txt176 B
Icon
Name
Janes-Blog.vert.zip
Size
198.65 MB
Format
application/zip
Description
Derived corpus in vertical format
MD5
1c77a5dc284d4093446dd64c14855cb2
 Download file  Preview
 File Preview  

Show simple item record