Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6

Name: Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6
License: https://opensource.org/licenses/Apache-2.0

Lebar Bajec, Iztok; Repar, Andraž; Demšar, Jure; Bajec, Žan; Rizvič, Mitja; Kumperščak, Borut; Bajec, Marko

Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6

CLARIN.SI data & tools

Authors: Lebar Bajec, Iztok ; et al.show everyone
Lebar Bajec, Iztok ; Repar, Andraž ; Demšar, Jure ; Bajec, Žan ; Rizvič, Mitja ; Kumperščak, Borut ; Bajec, Marko

Item identifier: http://hdl.handle.net/11356/1736

Project URL: https://rsdo.slovenscina.eu/en/machine-translation

Demo URL: https://www.slovenscina.eu/en/prevajalnik

Referenced by: https://github.com/clarinsi/Slovene_NMT

Date issued: 2022-12-01

Type: toolService

Language(s): English , Slovenian

Description: This Neural Machine Translation model for Slovene-English language pair was trained following the NVIDIA NeMo NMT AAYN recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation/machine_translation.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for translating text written in Slovene language to English and vice versa. The training corpus was built from publicly available datasets, including Parallel corpus EN-SL RSDO4 1.0 (https://www.clarin.si/repository/xmlui/handle/11356/1457), as well as a small portion of proprietary data. In total the training corpus consisted of 32.638.758 translation pairs and the validation corpus consisted of 8.163 translation pairs. The model was trained on 64GPUs and on the validation corpus reached a SacreBleu score of 48.3191 (at epoch 37) for translation from Slovene to English and a SacreBleu score of 53.8191 (at epoch 47) for translation from English to Slovene.