Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.date.accessioned 2017-08-31T07:08:15Z
dc.date.available 2017-08-31T07:08:15Z
dc.date.issued 2017-08-28
dc.identifier.uri http://hdl.handle.net/11356/1137
dc.description Janes-Wiki is an annotated corpus of discussion pages from the Slovene Wikipedia from the period 2003-08 to 2017-06. The corpus contains page and user talks and is structured into individual pages and their comments, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://revije.ff.uni-lj.si/slovenscina2/article/view/7003
dc.relation.isreferencedby http://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Wiki
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject Wikipedia
dc.subject word normalisation
dc.subject named entities
dc.subject TEI
dc.title Wikipedia talk corpus Janes-Wiki 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
contact.person Darja Fišer darja.fiser@ff.uni-lj.si Faculty of Arts, University of Ljubljana
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 78765 texts
size.info 5008067 tokens
files.count 2
files.size 58039005
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=janes_wiki
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=janes_wiki


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (55.35 MB)
Icon
Ime
Janes-Wiki.TEI.zip
Velikost
29.56 MB
Format
application/zip
Opis
Corpus in TEI format
MD5
b75f99ae0dec3164891202c342682b46
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • Janes-Wiki.TEI
    • janes.wiki.back.xml465 kB
    • janes.wiki.body.xml262 MB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.wiki.xml12 kB
    • 00README.txt168 B
Icon
Ime
Janes-Wiki.vert.zip
Velikost
25.79 MB
Format
application/zip
Opis
Derived corpus in vertical format
MD5
69f4e9f7a9e3c1f3cc5562df1a3d51c9
 Prenesi datoteko  Predogled
 Predogled datoteke  

Prikaži enostavni zapis vnosa