Show simple item record

 
dc.contributor.author Lebar Bajec, Iztok
dc.contributor.author Bajec, Marko
dc.contributor.author Bajec, Žan
dc.contributor.author Rizvič, Mitja
dc.date.accessioned 2022-12-02T10:43:49Z
dc.date.available 2022-12-02T10:43:49Z
dc.date.issued 2022-12-01
dc.identifier.uri http://hdl.handle.net/11356/1735
dc.description This Punctuation and Capitalisation model was trained following the NVIDIA NeMo Punctuation and Capitalisation recipe (for details see the official NVIDIA NeMo P&C documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/punctuation_and_capitalization.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for restoring punctuation (,.!?) and capital letters in lowercased non-punctuated Slovene text. The training corpus was built from publicly available datasets, as well as a small portion of proprietary data. In total the training corpus consisted of 38.829.529 sentences and the validation corpus consisted of 2.092.497 sentences.
dc.language.iso slv
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation.isreferencedby https://github.com/clarinsi/Slovene_punctuator
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/en/speech-technologies
dc.subject punctuation
dc.subject capitalisation
dc.subject NeMo
dc.subject model
dc.title Slovene Punctuation and Capitalisation model RSDO-DS2-P&C 3.6
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
demo.uri https://www.slovenscina.eu/en/razpoznavalnik
contact.person Iztok Lebar Bajec ilb@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 1
files.size 406589108


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
sl-SI_GEN_nemo-3.6.tar.zst
Size
387.75 MB
Format
Unknown
Description
RSDO DS2 P&C 3.6
MD5
241ce5d62c822290b0b33b8327825d9e
 Download file

Show simple item record