dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2017-08-31T07:16:34Z |
dc.date.available | 2017-08-31T07:16:34Z |
dc.date.issued | 2017-08-17 |
dc.identifier.uri | http://hdl.handle.net/11356/1139 |
dc.description | Janes-Forum is an annotated corpus of Slovene forums from websites med.over.net, avtomobilizem.com, and kvarkadabra.net from the period 2001-02 to 2015-01. The corpus is structured into forums, threads and posts, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy and compliance with wishes of platform owners, usernames are not included in the metadata, and 'person', 'person derivative' and 'company name' named entities have been removed from the texts. |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://doi.org/10.4312/slo2.0.2016.2.67-99 |
dc.relation.isreferencedby | https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Forum |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9425-z |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://nl.ijs.si/janes/ |
dc.subject | computer-mediated communication |
dc.subject | forums |
dc.subject | word normalisation |
dc.subject | named entities |
dc.subject | TEI |
dc.title | Forum corpus Janes-Forum 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
contact.person | Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
size.info | 772953 texts |
size.info | 47066575 tokens |
files.count | 2 |
files.size | 601073491 |
featuredService.kontext | search|https://www.clarin.si/kontext/first_form?corpname=janes_forum |
featuredService.noske | search|https://www.clarin.si/ske/#dashboard?corpname=janes_forum |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (573.23 MB)To je vnos
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
z licenco:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Ime
- Janes-Forum.TEI.zip
- Velikost
- 303.81 MB
- Format
- application/zip
- Opis
- Corpus in TEI format
- MD5
- e0b2f3e7ffed5c5219732a39bf8e4a0f
- Janes-Forum.TEI
- janes.forum.xml12 kB
- schema
- tei_janes_doc.html2 MB
- tei_janes.rng399 kB
- tei_janes_schema.xml2 kB
- tei_janes.zip44 kB
- tei_janes.rnc188 kB
- janes.forum.back.xml465 kB
- janes.forum.body.xml2 GB
- 00README.txt163 B

- Ime
- Janes-Forum.vert.zip
- Velikost
- 269.41 MB
- Format
- application/zip
- Opis
- Derived corpus in vertical format
- MD5
- e8c518a4ef121febde45590763b680cf
- Janes-Forum.vert
- janes_forum.vert1 GB
- janes_forum.regi5 kB
- 00README.txt163 B