dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2017-08-31T07:19:08Z |
dc.date.available | 2017-08-31T07:19:08Z |
dc.date.issued | 2017-08-17 |
dc.identifier.uri | http://hdl.handle.net/11356/1140 |
dc.description | Janes-News is an annotated corpus of comments on online news articles from websites rtvslo.si, mladina.si, and reporter.si from the period 2007-03 to 2015-01. The corpus is structured into individual texts containing the comments on a news article, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy, usernames are not included in the metadata and 'person' as well as 'person derivative' named entities have been removed from the texts. |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://doi.org/10.4312/slo2.0.2016.2.67-99 |
dc.relation.isreferencedby | https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-News |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9425-z |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://nl.ijs.si/janes/ |
dc.subject | computer-mediated communication |
dc.subject | news comments |
dc.subject | word normalisation |
dc.subject | named entities |
dc.subject | TEI |
dc.title | News comment corpus Janes-News 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
contact.person | Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
size.info | 299219 texts |
size.info | 14838074 tokens |
files.count | 2 |
files.size | 195540371 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=janes_news |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=janes_news |
Files in this item
Download all files in item (186.48 MB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- Janes-News.TEI.zip
- Size
- 98.84 MB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- 6a2718c8cbdc34dc6678df13f8cc3eca
- Janes-News.TEI
- janes.news.xml13 kB
- janes.news.back.xml465 kB
- schema
- tei_janes_doc.html2 MB
- tei_janes.rng399 kB
- tei_janes_schema.xml2 kB
- tei_janes.zip44 kB
- tei_janes.rnc188 kB
- janes.news.body.xml802 MB
- 00README.txt164 B

- Name
- Janes-News.vert.zip
- Size
- 87.64 MB
- Format
- application/zip
- Description
- Derived corpus in vertical format
- MD5
- 8ffcd6cc9162d188abef598ffb8bb226
- Janes-News.vert
- janes_news.vert535 MB
- janes_news.regi5 kB
- 00README.txt164 B