Show simple item record

 
dc.contributor.author Ulčar, Matej
dc.date.accessioned 2019-11-25T14:34:36Z
dc.date.available 2019-11-25T14:34:36Z
dc.date.issued 2019-11-25
dc.identifier.uri http://hdl.handle.net/11356/1277
dc.description ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for approximately 10 epochs. Corpora sizes used in training range from over 270 M tokens in Latvian to almost 2 B tokens in Croatian. About 1 million most common tokens were provided as vocabulary during the training for each language model. The model can also infer OOV words, since the neural network input is on the character level. Each model is in its own .tar.gz archive, consisting of two files: pytorch weights (.hdf5) and options (.json). Both are needed for model inference, using allennlp (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) python library.
dc.language.iso slv
dc.language.iso hrv
dc.language.iso fin
dc.language.iso est
dc.language.iso lav
dc.language.iso lit
dc.language.iso swe
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby https://arxiv.org/abs/1911.10049
dc.relation.replaces http://hdl.handle.net/11356/1257
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri http://embeddia.eu
dc.subject ELMo
dc.subject contextual embeddings
dc.subject word embeddings
dc.title ELMo embeddings models for seven languages
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
size.info 7 files
size.info 1.4 gb
files.count 7
files.size 1450271450


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
slovenian-elmo.tar.gz
Size
197.54 MB
Format
application/gzip
Description
Slovenian ELMo model
MD5
7743a0470fa24ee8cd010434151aef84
 Download file  Preview
 File Preview  
  • slovenian
    • options.json546 B
    • slovenian-elmo-weights.hdf5212 MB
Icon
Name
croatian-elmo.tar.gz
Size
197.63 MB
Format
application/gzip
Description
Croatian ELMo model
MD5
e8e708625a4a056968ec41fec33b00d1
 Download file  Preview
 File Preview  
  • croatian
    • options.json546 B
    • croatian-elmo-weights.hdf5212 MB
Icon
Name
finnish-elmo.tar.gz
Size
197.62 MB
Format
application/gzip
Description
Finnish ELMo model
MD5
7fbf313c4f96d46b9589e8f13e879a1d
 Download file  Preview
 File Preview  
  • finnish
    • options.json545 B
    • finnish-elmo-weights.hdf5212 MB
Icon
Name
estonian-elmo.tar.gz
Size
197.6 MB
Format
application/gzip
Description
Estonian ELMo model
MD5
563fb18ce5af956e2716ff982b55feab
 Download file  Preview
 File Preview  
  • estonian
    • options.json545 B
    • estonian-elmo-weights.hdf5212 MB
Icon
Name
lithuanian-elmo.tar.gz
Size
197.55 MB
Format
application/gzip
Description
Lithuanian ELMo model
MD5
f2a02c1033301c24d8e64d64b22507a0
 Download file  Preview
 File Preview  
  • lithuanian
    • options.json547 B
    • lithuanian-elmo-weights.hdf5212 MB
Icon
Name
latvian-elmo.tar.gz
Size
197.55 MB
Format
application/gzip
Description
Latvian ELMo model
MD5
62e503858368e586157c1df3627498fe
 Download file  Preview
 File Preview  
  • latvian
    • options.json544 B
    • latvian-elmo-weights.hdf5212 MB
Icon
Name
swedish-elmo.tar.gz
Size
197.6 MB
Format
application/gzip
Description
Swedish ELMo model
MD5
cdff6414c13f5d507995283c6c449060
 Download file  Preview
 File Preview  
  • swedish
    • options.json547 B
    • swedish-elmo-weights.hdf5212 MB

Show simple item record