ELMo embeddings model, Slovenian

Name: ELMo embeddings model, Slovenian
License: https://opensource.org/licenses/Apache-2.0

Ulčar, Matej

dc.contributor.author	Ulčar, Matej
dc.date.accessioned	2019-10-15T09:10:40Z
dc.date.available	2019-10-15T09:10:40Z
dc.date.issued	2019-10-15
dc.identifier.uri	http://hdl.handle.net/11356/1257
dc.description	ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the training. The model can also infer OOV words, since the neural network input is on the character level.
dc.language.iso	slv
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.relation	info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreplacedby	http://hdl.handle.net/11356/1277
dc.rights	Apache License 2.0
dc.rights.uri	https://opensource.org/licenses/Apache-2.0
dc.rights.label	PUB
dc.source.uri	http://embeddia.eu
dc.subject	ELMo
dc.subject	contextual embeddings
dc.subject	word embeddings
dc.title	ELMo embeddings model, Slovenian
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	other
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor	European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
size.info	213 mb
size.info	1364064 tokens
files.count	2
files.size	223309306

Datoteke v tem vnosu

Prenesi vse datoteke v vnosu (212.96 MB)

To je vnos

Publicly Available

z licenco:
Apache License 2.0

Ime: gigafida_weights.hdf5
Velikost: 212.96 MB
Format: Neznano
Opis: pytorch weights of the model
MD5: e312065555a9cc3d55b6a4c86138b498

Prenesi datoteko

Ime: options.json
Velikost: 546 bajtov
Format: Neznano
Opis: options file used for training and inference
MD5: c6212def0ff5cf2feb1054b475b7e8c0

Prenesi datoteko

Prikaži enostavni zapis vnosa

Datoteke v tem vnosu

Partnerji

Partnerji

Repozitorij