• Repository
  • About
  • Contact
  • CLARIN
  •  Login
  • English Slovenščina
  • CLARIN.SI repository
  • View Item
  •  
  • CLARIN logo
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Login
  •   Statistics  
    •    Piwik StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

cSMTiser: word standardisation

 
CLARIN.SI data & tools
  Authors
Ljubešić, Nikola ; Perovšek, Matic and Erjavec, Tomaž
  Item identifier
http://hdl.handle.net/11356/1169
 Project URL
https://github.com/clarinsi/csmtiser
 Date issued
2017-11-27
 Type
toolService
 Language(s)
Slovenian
 Description
Word standardisation of non-standard language as found in user-generated content, using cSMTiser (https://github.com/clarinsi/csmtiser), a tool for text normalisation via character-level machine translation. The tool has been trained on the Janes-Norm dataset (http://hdl.handle.net/11356/1084) and background resources.
 Publisher
Jožef Stefan Institute
 Acknowledgement
ARRS (Slovenian Research Agency) J6-6842 "JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene"
Swiss National Science Foundation 160501 "ReLDI"
ARRS (Slovenian Research Agency) P2-103 "Knowledge Technologies"
 Subject(s)
word normalisation
 Collection(s)
CLARIN.SI WebLicht
Show full item record
 
 

Partners

  • Alpineon, d.o.o.
  • Amebis, d.o.o.
  • Institute of Contemporary History
  • Jožef Stefan Institute
  • National and University Library of Slovenia
  • Slovenian Language Technologies Society

Partners

  • University of Ljubljana
  • University of Maribor
  • University of Nova Gorica
  • University of Primorska
  • ZRC SAZU
  • ZRS Koper

Repository

  • Main page
  • Contact
  • Submission Lifecycle
  • FAQ
  • About and Policies

This platform runs under the software developed for the LINDAT/CLARIAH-CZ repository for linguistics, available on GitHub

CLARIN.SI is supported by the Ministry of Education, Science and Sport of the Republic of Slovenia
under the Programme of "Research Infrastructures".