Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Fišer, Darja
dc.date.accessioned 2017-08-31T07:19:08Z
dc.date.available 2017-08-31T07:19:08Z
dc.date.issued 2017-08-17
dc.identifier.uri http://hdl.handle.net/11356/1140
dc.description Janes-News is an annotated corpus of comments on online news articles from websites rtvslo.si, mladina.si, and reporter.si from the period 2007-03 to 2015-01. The corpus is structured into individual texts containing the comments on a news article, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy, usernames are not included in the metadata and 'person' as well as 'person derivative' named entities have been removed from the texts.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://doi.org/10.4312/slo2.0.2016.2.67-99
dc.relation.isreferencedby https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-News
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject news comments
dc.subject word normalisation
dc.subject named entities
dc.subject TEI
dc.title News comment corpus Janes-News 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
contact.person Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 299219 texts
size.info 14838074 tokens
files.count 2
files.size 195540371
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=janes_news
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=janes_news


 Files in this item

 Download all files in item (186.48 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
Janes-News.TEI.zip
Size
98.84 MB
Format
application/zip
Description
Corpus in TEI format
MD5
6a2718c8cbdc34dc6678df13f8cc3eca
 Download file  Preview
 File Preview  
  • Janes-News.TEI
    • janes.news.xml13 kB
    • janes.news.back.xml465 kB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.news.body.xml802 MB
    • 00README.txt164 B
Icon
Name
Janes-News.vert.zip
Size
87.64 MB
Format
application/zip
Description
Derived corpus in vertical format
MD5
8ffcd6cc9162d188abef598ffb8bb226
 Download file  Preview
 File Preview  

Show simple item record