Show simple item record

 
dc.contributor.author Popič, Damjan
dc.contributor.author Zupan, Katja
dc.contributor.author Logar, Polona
dc.contributor.author Kavčič, Teja
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.date.accessioned 2017-02-16T12:28:26Z
dc.date.available 2017-02-16T12:28:26Z
dc.date.issued 2017-02-16
dc.identifier.uri http://hdl.handle.net/11356/1088
dc.description Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to the supplied typology. The corpus was sampled from the Janes-Norm corpus (http://hdl.handle.net/11356/1084), which was manually annotated for tokenisation, sentence segmentation, and word normalisation, and automatically for morphosyntactic descriptions and lemmas. The corpus is further described in: POPIČ, Damjan, FIŠER, Darja, ZUPAN, Katja, LOGAR, Polona. Raba vejice v uporabniških spletnih vsebinah. Proceedings of the Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 149-153. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://nl.ijs.si/janes/viri/rocno-oznaceni-korpusi/#Janes-Vejica
dc.relation.isreferencedby https://doi.org/10.1007/s10579-018-9425-z
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://nl.ijs.si/janes/
dc.subject computer-mediated communication
dc.subject Twitter
dc.subject comma placement
dc.subject TEI
dc.subject manual annotation
dc.title Tweet comma corpus Janes-Vejica 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor ARRS (Slovenian Research Agency) MR-37487 Young Researcher Programme nationalFunds
size.info 495 texts
size.info 14031 tokens
files.count 4
files.size 1905142
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=janes_vejica
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=janes_vejica


 Files in this item

 Download all files in item (1.82 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
JTDH-2016_Popic-et-al_Raba-vejice-v-uporabniskih-spletnih-vsebinah.pdf
Size
1.13 MB
Format
PDF
Description
JTDH'16 paper describing the corpus and analysis
MD5
4fdac0ae810943445eb2bd03a68311c8
 Download file
Icon
Name
Janes-vejica-typology.zip
Size
68.96 KB
Format
application/zip
Description
Typology of comma error reasons (in Slovene)
MD5
6d44e75f4826cc27cc368d381756c45b
 Download file  Preview
 File Preview  
    • Janes-vejica-typology.xml6 kB
    • Janes-vejica-typology.docx18 kB
    • Janes-vejica-typology.txt1 kB
    • Janes-vejica-typology.pdf56 kB
Icon
Name
Janes-Vejica.zip
Size
527.01 KB
Format
application/zip
Description
Corpus in TEI format
MD5
ef97343cad89b64eed3b19538a81a714
 Download file  Preview
 File Preview  
  • Janes-Vejica
    • msd-fslib-sl.xml461 kB
    • schema
      • tei_janes_doc.html2 MB
      • tei_janes.rng399 kB
      • tei_janes_schema.xml2 kB
      • tei_janes.zip44 kB
      • tei_janes.rnc188 kB
    • janes.vejica.xml18 kB
    • janes.vejica.body.xml1 MB
Icon
Name
Janes-Vejica.vert.zip
Size
103.58 KB
Format
application/zip
Description
Corpus in vertical format
MD5
f6248c1483e993621d6cb020da5ffd3d
 Download file  Preview
 File Preview  
    • janes.vejica.vert471 kB
    • janes.vejica.regi1 kB

Show simple item record