dc.contributor.author | Popič, Damjan |
dc.contributor.author | Zupan, Katja |
dc.contributor.author | Logar, Polona |
dc.contributor.author | Kavčič, Teja |
dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2017-02-16T12:28:26Z |
dc.date.available | 2017-02-16T12:28:26Z |
dc.date.issued | 2017-02-16 |
dc.identifier.uri | http://hdl.handle.net/11356/1088 |
dc.description | Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to the supplied typology. The corpus was sampled from the Janes-Norm corpus (http://hdl.handle.net/11356/1084), which was manually annotated for tokenisation, sentence segmentation, and word normalisation, and automatically for morphosyntactic descriptions and lemmas. The corpus is further described in: POPIČ, Damjan, FIŠER, Darja, ZUPAN, Katja, LOGAR, Polona. Raba vejice v uporabniških spletnih vsebinah. Proceedings of the Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 149-153. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/ |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | https://nl.ijs.si/janes/viri/rocno-oznaceni-korpusi/#Janes-Vejica |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-018-9425-z |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://nl.ijs.si/janes/ |
dc.subject | computer-mediated communication |
dc.subject | |
dc.subject | comma placement |
dc.subject | TEI |
dc.subject | manual annotation |
dc.title | Tweet comma corpus Janes-Vejica 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
sponsor | ARRS (Slovenian Research Agency) MR-37487 Young Researcher Programme nationalFunds |
size.info | 495 texts |
size.info | 14031 tokens |
files.count | 4 |
files.size | 1905142 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=janes_vejica |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=janes_vejica |
Files in this item
Download all files in item (1.82 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- JTDH-2016_Popic-et-al_Raba-vejice-v-uporabniskih-spletnih-vsebinah.pdf
- Size
- 1.13 MB
- Format
- Description
- JTDH'16 paper describing the corpus and analysis
- MD5
- 4fdac0ae810943445eb2bd03a68311c8

- Name
- Janes-vejica-typology.zip
- Size
- 68.96 KB
- Format
- application/zip
- Description
- Typology of comma error reasons (in Slovene)
- MD5
- 6d44e75f4826cc27cc368d381756c45b

- Name
- Janes-Vejica.zip
- Size
- 527.01 KB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- ef97343cad89b64eed3b19538a81a714
- Janes-Vejica
- msd-fslib-sl.xml461 kB
- schema
- tei_janes_doc.html2 MB
- tei_janes.rng399 kB
- tei_janes_schema.xml2 kB
- tei_janes.zip44 kB
- tei_janes.rnc188 kB
- janes.vejica.xml18 kB
- janes.vejica.body.xml1 MB

- Name
- Janes-Vejica.vert.zip
- Size
- 103.58 KB
- Format
- application/zip
- Description
- Corpus in vertical format
- MD5
- f6248c1483e993621d6cb020da5ffd3d