dc.contributor.author | Goli, Teja |
dc.contributor.author | Osrajnik, Eneja |
dc.contributor.author | Fišer, Darja |
dc.contributor.author | Erjavec, Tomaž |
dc.date.accessioned | 2017-01-20T14:05:33Z |
dc.date.available | 2017-01-20T14:05:33Z |
dc.date.issued | 2017-01-20 |
dc.identifier.uri | http://hdl.handle.net/11356/1087 |
dc.description | Janes-Kratko is a corpus of Slovene tweets manually annotated with shortening phenomena according to the supplied typology covering different types of spelling, lexical and syntactic shortenings. The corpus was sampled from the Janes-Norm corpus (http://hdl.handle.net/11356/1084), which was manually annotated for tokenisation, sentence segmentation and word normalisation of non-standard Slovene and automatically annotated with morphosyntactic descriptions and lemmas. The corpus is further described in: GOLI, Teja, OSRAJNIK, Eneja, FIŠER, Darja. Analiza krajšanja slovenskih sporočil na družbenem omrežju Twitter. Proceedings of the Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 77-82. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/ |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://nl.ijs.si/janes/viri/rocno-oznaceni-korpusi/#Janes-Kratko |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9425-z |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://nl.ijs.si/janes/ |
dc.subject | computer-mediated communication |
dc.subject | |
dc.subject | shortening phenomena |
dc.subject | TEI |
dc.subject | manual annotation |
dc.title | CMC shortening corpus Janes-Kratko 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
size.info | 777 texts |
size.info | 20222 tokens |
files.count | 4 |
files.size | 2019138 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=janes_kratko |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=janes_kratko |
Files in this item
Download all files in item (1.93 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- JTDH-2016_Goli-et-al_Analiza-krajsanja-slovenskih-sporocil.pdf
- Size
- 1.08 MB
- Format
- Description
- JTDH'16 paper describing the corpus & analysis
- MD5
- 9a0ae9e11645a8267441c2c5b1d0ad5a
- Name
- Janes-kratko-typology.zip
- Size
- 116.58 KB
- Format
- application/zip
- Description
- Typology of shortenings (in Slovene)
- MD5
- b5d9d8661608ebd720621644618fe453
- Name
- Janes-Kratko.zip
- Size
- 590.91 KB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- 0a751664051e5a4a066cc1ae126f11ef
- Janes-Kratko
- msd-fslib-sl.xml461 kB
- janes.kratko.xml17 kB
- schema
- tei_janes_doc.html2 MB
- tei_janes.rng399 kB
- tei_janes_schema.xml2 kB
- tei_janes.zip44 kB
- tei_janes.rnc188 kB
- janes.kratko.body.xml1 MB
- Name
- Janes-Kratko.vert.zip
- Size
- 159.71 KB
- Format
- application/zip
- Description
- Corpus in vertical format
- MD5
- b5f209c8374aaa6f97c8e4323e94c2e3