Prikaži enostavni zapis vnosa

 
dc.contributor.author Ulčar, Matej
dc.date.accessioned 2019-11-25T14:34:36Z
dc.date.available 2019-11-25T14:34:36Z
dc.date.issued 2019-11-25
dc.identifier.uri http://hdl.handle.net/11356/1277
dc.description ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for approximately 10 epochs. Corpora sizes used in training range from over 270 M tokens in Latvian to almost 2 B tokens in Croatian. About 1 million most common tokens were provided as vocabulary during the training for each language model. The model can also infer OOV words, since the neural network input is on the character level. Each model is in its own .tar.gz archive, consisting of two files: pytorch weights (.hdf5) and options (.json). Both are needed for model inference, using allennlp (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) python library.
dc.language.iso slv
dc.language.iso hrv
dc.language.iso fin
dc.language.iso est
dc.language.iso lav
dc.language.iso lit
dc.language.iso swe
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby https://arxiv.org/abs/1911.10049
dc.relation.replaces http://hdl.handle.net/11356/1257
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri http://embeddia.eu
dc.subject ELMo
dc.subject contextual embeddings
dc.subject word embeddings
dc.title ELMo embeddings models for seven languages
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
size.info 7 files
size.info 1.4 gb
files.count 7
files.size 1450271450


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Apache License 2.0
Icon
Ime
slovenian-elmo.tar.gz
Velikost
197.54 MB
Format
application/gzip
Opis
Slovenian ELMo model
MD5
7743a0470fa24ee8cd010434151aef84
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • slovenian
    • options.json546 B
    • slovenian-elmo-weights.hdf5212 MB
Icon
Ime
croatian-elmo.tar.gz
Velikost
197.63 MB
Format
application/gzip
Opis
Croatian ELMo model
MD5
e8e708625a4a056968ec41fec33b00d1
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • croatian
    • options.json546 B
    • croatian-elmo-weights.hdf5212 MB
Icon
Ime
finnish-elmo.tar.gz
Velikost
197.62 MB
Format
application/gzip
Opis
Finnish ELMo model
MD5
7fbf313c4f96d46b9589e8f13e879a1d
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • finnish
    • options.json545 B
    • finnish-elmo-weights.hdf5212 MB
Icon
Ime
estonian-elmo.tar.gz
Velikost
197.6 MB
Format
application/gzip
Opis
Estonian ELMo model
MD5
563fb18ce5af956e2716ff982b55feab
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • estonian
    • options.json545 B
    • estonian-elmo-weights.hdf5212 MB
Icon
Ime
lithuanian-elmo.tar.gz
Velikost
197.55 MB
Format
application/gzip
Opis
Lithuanian ELMo model
MD5
f2a02c1033301c24d8e64d64b22507a0
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • lithuanian
    • options.json547 B
    • lithuanian-elmo-weights.hdf5212 MB
Icon
Ime
latvian-elmo.tar.gz
Velikost
197.55 MB
Format
application/gzip
Opis
Latvian ELMo model
MD5
62e503858368e586157c1df3627498fe
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • latvian
    • options.json544 B
    • latvian-elmo-weights.hdf5212 MB
Icon
Ime
swedish-elmo.tar.gz
Velikost
197.6 MB
Format
application/gzip
Opis
Swedish ELMo model
MD5
cdff6414c13f5d507995283c6c449060
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • swedish
    • options.json547 B
    • swedish-elmo-weights.hdf5212 MB

Prikaži enostavni zapis vnosa