dc.contributor.author |
Ulčar, Matej |
dc.date.accessioned |
2019-10-15T09:10:40Z |
dc.date.available |
2019-10-15T09:10:40Z |
dc.date.issued |
2019-10-15 |
dc.identifier.uri |
http://hdl.handle.net/11356/1257 |
dc.description |
ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the training. The model can also infer OOV words, since the neural network input is on the character level. |
dc.language.iso |
slv |
dc.publisher |
Faculty of Computer and Information Science, University of Ljubljana |
dc.relation |
info:eu-repo/grantAgreement/EC/H2020/825153 |
dc.relation.isreplacedby |
http://hdl.handle.net/11356/1277 |
dc.rights |
Apache License 2.0 |
dc.rights.uri |
https://opensource.org/licenses/Apache-2.0 |
dc.rights.label |
PUB |
dc.source.uri |
http://embeddia.eu |
dc.subject |
ELMo |
dc.subject |
contextual embeddings |
dc.subject |
word embeddings |
dc.title |
ELMo embeddings model, Slovenian |
dc.type |
toolService |
metashare.ResourceInfo#ContentInfo.detailedType |
other |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
contact.person |
Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
sponsor |
European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153 |
size.info |
213 mb |
size.info |
1364064 tokens |
files.count |
2 |
files.size |
223309306 |