Prikaži enostavni zapis vnosa

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Fišer, Darja
dc.date.accessioned 2017-08-31T07:16:34Z
dc.date.available 2017-08-31T07:16:34Z
dc.date.issued 2017-08-17
dc.identifier.uri http://hdl.handle.net/11356/1139
dc.description Janes-Forum is an annotated corpus of Slovene forums from websites med.over.net, avtomobilizem.com, and kvarkadabra.net from the period 2001-02 to 2015-01. The corpus is structured into forums, threads and posts, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy and compliance with wishes of platform owners, usernames are not included in the metadata, and 'person', 'person derivative' and 'company name' named entities have been removed from the texts.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://doi.org/10.4312/slo2.0.2016.2.67-99
dc.relation.isreferencedby https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Forum
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject forums
dc.subject word normalisation
dc.subject named entities
dc.subject TEI
dc.title Forum corpus Janes-Forum 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
contact.person Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 772953 texts
size.info 47066575 tokens
files.count 2
files.size 601073491
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=janes_forum
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=janes_forum


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (573.23 MB)
To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
Janes-Forum.TEI.zip
Velikost
303.81 MB
Format
application/zip
Opis
Corpus in TEI format
MD5
e0b2f3e7ffed5c5219732a39bf8e4a0f
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • Janes-Forum.TEI
    • janes.forum.xml12 kB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.forum.back.xml465 kB
    • janes.forum.body.xml2 GB
    • 00README.txt163 B
Icon
Ime
Janes-Forum.vert.zip
Velikost
269.41 MB
Format
application/zip
Opis
Derived corpus in vertical format
MD5
e8c518a4ef121febde45590763b680cf
 Prenesi datoteko  Predogled
 Predogled datoteke  

Prikaži enostavni zapis vnosa