dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Batanović, Vuk |
dc.contributor.author | Miličević, Maja |
dc.contributor.author | Samardžić, Tanja |
dc.date.accessioned | 2019-09-11T15:40:03Z |
dc.date.available | 2019-09-11T15:40:03Z |
dc.date.issued | 2019-07-28 |
dc.identifier.uri | http://hdl.handle.net/11356/1240 |
dc.description | ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). As an update to version 2.0, version 2.1 corrects some annotation errors and adds morphosyntactic annotations in the Universal Dependencies formalism in addition to the MULTEXT-East morphosyntactic descriptions. The corpus is now also available in CoNLL-U format. |
dc.language.iso | srp |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | http://dx.doi.org/10.4312/slo2.0.2016.2.156-188 |
dc.relation.replaces | http://hdl.handle.net/11356/1171 |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1794 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.subject | computer-mediated communication |
dc.subject | tokenisation |
dc.subject | word normalisation |
dc.subject | part-of-speech tagging |
dc.subject | lemmatisation |
dc.subject | named entities |
dc.subject | manual annotation |
dc.subject | TEI |
dc.title | Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | Swiss National Science Foundation 160501 ReLDI Other |
sponsor | ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
size.info | 3748 texts |
size.info | 91781 tokens |
files.count | 4 |
files.size | 4733374 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=reldi_sr |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=reldi_sr |
Files in this item
Download all files in item (4.51 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- ReLDI-sr.zip
- Size
- 2.44 MB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- 6693e03fc73cb51a9c3a1cc186f9fdeb

- Name
- ReLDI-sr.vert.zip
- Size
- 818.37 KB
- Format
- application/zip
- Description
- Corpus in derived vertical (Sketch Engine / CQP) format
- MD5
- 10ff22c57a2a2926344202ea2999bf90
- ReLDI-sr.vert
- reldi_sr.vert5 MB
- reldi_sr.regi2 kB
- 00README.txt184 B

- Name
- ReLDI-sr.conllu.zip
- Size
- 1.02 MB
- Format
- application/zip
- Description
- Corpus in derived CONLL-U format
- MD5
- 8bd008f2f2887d3fa21d1f439e4ee65d
- ReLDI-sr.conllu
- ReLDI-sr.conllu6 MB
- 00README.txt184 B

- Name
- ReLDI-NormTag-Guidelines.pdf
- Size
- 261.5 KB
- Format
- Description
- Annotation guidelines (in Serbo-Croatian)
- MD5
- 237ec14a7885af2b6d7cd3e3853ec70a