Show simple item record

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Batanović, Vuk
dc.contributor.author Miličević, Maja
dc.contributor.author Samardžić, Tanja
dc.date.accessioned 2019-09-11T15:40:03Z
dc.date.available 2019-09-11T15:40:03Z
dc.date.issued 2019-07-28
dc.identifier.uri http://hdl.handle.net/11356/1240
dc.description ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). As an update to version 2.0, version 2.1 corrects some annotation errors and adds morphosyntactic annotations in the Universal Dependencies formalism in addition to the MULTEXT-East morphosyntactic descriptions. The corpus is now also available in CoNLL-U format.
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://dx.doi.org/10.4312/slo2.0.2016.2.156-188
dc.relation.replaces http://hdl.handle.net/11356/1171
dc.relation.isreplacedby http://hdl.handle.net/11356/1794
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://reldi.spur.uzh.ch/
dc.subject computer-mediated communication
dc.subject tokenisation
dc.subject word normalisation
dc.subject part-of-speech tagging
dc.subject lemmatisation
dc.subject named entities
dc.subject manual annotation
dc.subject TEI
dc.title Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor Swiss National Science Foundation 160501 ReLDI Other
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 3748 texts
size.info 91781 tokens
files.count 4
files.size 4733374
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=reldi_sr
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=reldi_sr&struct_attr_stats=1


 Files in this item

 Download all files in item (4.51 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
ReLDI-sr.zip
Size
2.44 MB
Format
application/zip
Description
Corpus in TEI format
MD5
6693e03fc73cb51a9c3a1cc186f9fdeb
 Download file  Preview
 File Preview  
  • ReLDI-sr
    • reldi-sr.body.xml9 MB
    • msd-fslib-hbs.xml89 kB
    • schema
      • tei_clarin_doc.xml7 MB
      • tei_clarin.zip87 kB
      • tei_clarin_schema.xml3 kB
      • tei_clarin.rnc282 kB
      • tei_clarin.dtd229 kB
      • tei_clarin_doc.html7 MB
      • tei_clarin.rng579 kB
    • reldi-sr.xml7 kB
    • 00README.txt184 B
Icon
Name
ReLDI-sr.vert.zip
Size
818.37 KB
Format
application/zip
Description
Corpus in derived vertical (Sketch Engine / CQP) format
MD5
10ff22c57a2a2926344202ea2999bf90
 Download file  Preview
 File Preview  
Icon
Name
ReLDI-sr.conllu.zip
Size
1.02 MB
Format
application/zip
Description
Corpus in derived CONLL-U format
MD5
8bd008f2f2887d3fa21d1f439e4ee65d
 Download file  Preview
 File Preview  
Icon
Name
ReLDI-NormTag-Guidelines.pdf
Size
261.5 KB
Format
PDF
Description
Annotation guidelines (in Serbo-Croatian)
MD5
237ec14a7885af2b6d7cd3e3853ec70a
 Download file

Show simple item record