Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Farkaš, Daša
dc.contributor.author Klubička, Filip
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Miličević, Maja
dc.contributor.author Vuković, Teodora
dc.date.accessioned 2017-05-15T13:43:51Z
dc.date.available 2017-05-15T13:43:51Z
dc.date.issued 2017-05-14
dc.identifier.uri http://hdl.handle.net/11356/1120
dc.description ReLDI-NormTag-sr 1.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). As an update to version 1.0, 1.1 corrects some minor errors. The corpus construction is (partially) described in: MILIČEVIĆ, Maja, LJUBEŠIĆ, Nikola. Tviterasi, tviteraši or twitteraši? Producing and analysing a normalised dataset of Croatian and Serbian tweets. Slovenščina 2.0: empirical, applied and interdisciplinary research, 4/2, 2016. ISSN 2335-2736. http://dx.doi.org/10.4312/slo2.0.2016.2.156-188
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.relation.replaces http://hdl.handle.net/11356/1096
dc.relation.isreplacedby http://hdl.handle.net/11356/1171
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject computer-mediated communication
dc.subject tokenisation
dc.subject word normalisation
dc.subject tagging
dc.subject lemmatisation
dc.subject manual annotation
dc.subject TEI
dc.title Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor Swiss National Science Foundation 160501 ReLDI Other
sponsor ARRS (Slovenian Research Agency) J6-6842 JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 3748 texts
size.info 91781 tokens
files.count 3
files.size 2102052


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (2 MB)
To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
ReLDI-sr.zip
Velikost
1.11 MB
Format
application/zip
Opis
Corpus in TEI format
MD5
3f2275ec64448ee9ac16262d95929c24
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • schema
    • tei_janes_doc.html2 MB
    • tei_janes.rng399 kB
    • tei_janes_schema.xml2 kB
    • tei_janes.zip44 kB
    • tei_janes.rnc188 kB
    • reldi-sr.body.xml6 MB
    • msd-fslib-bs.xml82 kB
    • reldi-sr.xml5 kB
Icon
Ime
ReLDI-sr.vert.zip
Velikost
656.46 KB
Format
application/zip
Opis
Derived corpus in vertical format
MD5
03dd441c155df9c25421a563645138d0
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • reldi_sr.vert2 MB
    • reldi_sr.regi1 kB
Icon
Ime
ReLDI-NormTag-Guidelines.pdf
Velikost
261.5 KB
Format
PDF
Opis
Annotation guidelines (in Serbo-Croatian)
MD5
237ec14a7885af2b6d7cd3e3853ec70a
 Prenesi datoteko

Prikaži enostavni zapis vnosa