ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the training. The model can also infer OOV words, since the neural network input is on the character level.
dc.language.iso
slv
dc.publisher
Faculty of Computer and Information Science, University of Ljubljana
dc.relation
info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreplacedby
http://hdl.handle.net/11356/1277
dc.rights
Apache License 2.0
dc.rights.uri
https://opensource.org/licenses/Apache-2.0
dc.rights.label
PUB
dc.source.uri
http://embeddia.eu
dc.subject
ELMo
dc.subject
contextual embeddings
dc.subject
word embeddings
dc.title
ELMo embeddings model, Slovenian
dc.type
toolService
metashare.ResourceInfo#ContentInfo.detailedType
other
metashare.ResourceInfo#ContentInfo.mediaType
text
has.files
yes
branding
CLARIN.SI data & tools
contact.person
Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor
European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153