Show simple item record

 
dc.contributor.author Goli, Teja
dc.contributor.author Osrajnik, Eneja
dc.contributor.author Fišer, Darja
dc.contributor.author Erjavec, Tomaž
dc.date.accessioned 2017-01-20T14:05:33Z
dc.date.available 2017-01-20T14:05:33Z
dc.date.issued 2017-01-20
dc.identifier.uri http://hdl.handle.net/11356/1087
dc.description Janes-Kratko is a corpus of Slovene tweets manually annotated with shortening phenomena according to the supplied typology covering different types of spelling, lexical and syntactic shortenings. The corpus was sampled from the Janes-Norm corpus (http://hdl.handle.net/11356/1084), which was manually annotated for tokenisation, sentence segmentation and word normalisation of non-standard Slovene and automatically annotated with morphosyntactic descriptions and lemmas. The corpus is further described in: GOLI, Teja, OSRAJNIK, Eneja, FIŠER, Darja. Analiza krajšanja slovenskih sporočil na družbenem omrežju Twitter. Proceedings of the Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 77-82. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://nl.ijs.si/janes/viri/rocno-oznaceni-korpusi/#Janes-Kratko
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject Twitter
dc.subject shortening phenomena
dc.subject TEI
dc.subject manual annotation
dc.title CMC shortening corpus Janes-Kratko 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 777 texts
size.info 20222 tokens
files.count 4
files.size 2019138
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=janes_kratko
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=janes_kratko


 Files in this item

 Download all files in item (1.93 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
JTDH-2016_Goli-et-al_Analiza-krajsanja-slovenskih-sporocil.pdf
Size
1.08 MB
Format
PDF
Description
JTDH'16 paper describing the corpus & analysis
MD5
9a0ae9e11645a8267441c2c5b1d0ad5a
 Download file
Icon
Name
Janes-kratko-typology.zip
Size
116.58 KB
Format
application/zip
Description
Typology of shortenings (in Slovene)
MD5
b5d9d8661608ebd720621644618fe453
 Download file  Preview
 File Preview  
    • Janes-kratko-typology.tbl1 kB
    • Janes-kratko-typology.xml5 kB
    • Janes-kratko-typology.docx122 kB
Icon
Name
Janes-Kratko.zip
Size
590.91 KB
Format
application/zip
Description
Corpus in TEI format
MD5
0a751664051e5a4a066cc1ae126f11ef
 Download file  Preview
 File Preview  
  • Janes-Kratko
    • msd-fslib-sl.xml461 kB
    • janes.kratko.xml17 kB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.kratko.body.xml1 MB
Icon
Name
Janes-Kratko.vert.zip
Size
159.71 KB
Format
application/zip
Description
Corpus in vertical format
MD5
b5f209c8374aaa6f97c8e4323e94c2e3
 Download file  Preview
 File Preview  
    • janes.kratko.vert714 kB
    • janes.kratko.regi1 kB

Show simple item record