This Neural Machine Translation model for Slovene-English language pair was trained following the NVIDIA NeMo NMT AAYN recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation/machine_translation.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for translating text written in Slovene language to English and vice versa.
The training corpus was built from publicly available datasets, including Parallel corpus EN-SL RSDO4 1.0 (https://www.clarin.si/repository/xmlui/handle/11356/1457), as well as a small portion of proprietary data. In total the training corpus consisted of 32.638.758 translation pairs and the validation corpus consisted of 8.163 translation pairs. The model was trained on 64GPUs and on the validation corpus reached a SacreBleu score of 48.3191 (at epoch 37) for translation from Slovene to English and a SacreBleu score of 53.8191 (at epoch 47) for translation from English to Slovene.