dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2017-08-31T07:08:15Z |
dc.date.available | 2017-08-31T07:08:15Z |
dc.date.issued | 2017-08-28 |
dc.identifier.uri | http://hdl.handle.net/11356/1137 |
dc.description | Janes-Wiki is an annotated corpus of discussion pages from the Slovene Wikipedia from the period 2003-08 to 2017-06. The corpus contains page and user talks and is structured into individual pages and their comments, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://revije.ff.uni-lj.si/slovenscina2/article/view/7003 |
dc.relation.isreferencedby | http://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Wiki |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9425-z |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://nl.ijs.si/janes/ |
dc.subject | computer-mediated communication |
dc.subject | Wikipedia |
dc.subject | word normalisation |
dc.subject | named entities |
dc.subject | TEI |
dc.title | Wikipedia talk corpus Janes-Wiki 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
contact.person | Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana |
sponsor | ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
size.info | 78765 texts |
size.info | 5008067 tokens |
files.count | 2 |
files.size | 58039005 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=janes_wiki |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=janes_wiki |
Files in this item
Download all files in item (55.35 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- Janes-Wiki.TEI.zip
- Size
- 29.56 MB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- b75f99ae0dec3164891202c342682b46
- Janes-Wiki.TEI
- janes.wiki.back.xml465 kB
- janes.wiki.body.xml262 MB
- schema
- tei_janes_doc.html2 MB
- tei_janes.rng399 kB
- tei_janes_schema.xml2 kB
- tei_janes.zip44 kB
- tei_janes.rnc188 kB
- janes.wiki.xml12 kB
- 00README.txt168 B

- Name
- Janes-Wiki.vert.zip
- Size
- 25.79 MB
- Format
- application/zip
- Description
- Derived corpus in vertical format
- MD5
- 69f4e9f7a9e3c1f3cc5562df1a3d51c9
- Janes-Wiki.vert
- janes_wiki.vert180 MB
- janes_wiki.regi5 kB
- 00README.txt168 B