Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Fišer, Darja
dc.date.accessioned 2017-08-31T07:16:34Z
dc.date.available 2017-08-31T07:16:34Z
dc.date.issued 2017-08-17
dc.identifier.uri http://hdl.handle.net/11356/1139
dc.description Janes-Forum is an annotated corpus of Slovene forums from websites med.over.net, avtomobilizem.com, and kvarkadabra.net from the period 2001-02 to 2015-01. The corpus is structured into forums, threads and posts, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy and compliance with wishes of platform owners, usernames are not included in the metadata, and 'person', 'person derivative' and 'company name' named entities have been removed from the texts.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://slovenscina2.0.trojina.si/arhiv/2016-2/2016-2-04/
dc.relation.isreferencedby http://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Forum
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject forums
dc.subject word normalisation
dc.subject named entities
dc.subject TEI
dc.title Forum corpus Janes-Forum 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
contact.person Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 772953 texts
size.info 47066575 tokens
files.count 2
files.size 601073491
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=janes_forum
featuredService.noske search|https://www.clarin.si/noske/run.cgi/corp_info?corpname=janes_forum


 Files in this item

 Download all files in item (573.23 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
Janes-Forum.TEI.zip
Size
303.81 MB
Format
application/zip
Description
Corpus in TEI format
MD5
e0b2f3e7ffed5c5219732a39bf8e4a0f
 Download file  Preview
 File Preview  
  • Janes-Forum.TEI
    • janes.forum.xml12 kB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.forum.back.xml465 kB
    • janes.forum.body.xml2 GB
    • 00README.txt163 B
Icon
Name
Janes-Forum.vert.zip
Size
269.41 MB
Format
application/zip
Description
Derived corpus in vertical format
MD5
e8c518a4ef121febde45590763b680cf
 Download file  Preview
 File Preview  

Show simple item record