Prikaži enostavni zapis vnosa

 
dc.contributor.author Ulčar, Matej
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2020-06-17T09:18:37Z
dc.date.available 2020-06-17T09:18:37Z
dc.date.issued 2020-06-16
dc.identifier.uri http://hdl.handle.net/11356/1317
dc.description Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by finetuning the model end-to-end. CroSloEngual BERT are neural network weights and configuration files in pytorch format (ie. to be used with pytorch library).
dc.language.iso hrv
dc.language.iso slv
dc.language.iso eng
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby https://arxiv.org/abs/2006.07890
dc.relation.isreplacedby http://hdl.handle.net/11356/1330
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://embeddia.eu
dc.subject word embeddings
dc.subject multilingual
dc.subject contextual embeddings
dc.subject BERT
dc.subject language model
dc.title CroSloEngual BERT
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
hidden hidden
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
files.count 3
files.size 499491051


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (476.35 MB)
To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
pytorch_model.bin
Velikost
476.04 MB
Format
Neznano
Opis
CroSloEngual BERT model
MD5
6b26401118943bf61b66a70d2ae68b9d
 Prenesi datoteko
Icon
Ime
config.json
Velikost
520 bajtov
Format
Neznano
Opis
Configuration file, describing the model's architecture
MD5
db3bdd5c4db6ffffa9bf3edab2e7be70
 Prenesi datoteko
Icon
Ime
vocab.txt
Velikost
321.41 KB
Format
Besedilna datoteka
Opis
Subword token (WordPiece) vocabulary
MD5
37b1484f6841d214c3101f878f2473ae
 Prenesi datoteko  Predogled
 Predogled datoteke  
[PAD]
[EOS]
[PAD]
[unused0]
[unused1]
[unused2]
[unused3]
[unused4]
[unused5]
[unused6]
[unused7]
[unused8]
[unused9]
[unused10]
[unused11]
[unused12]
[unused13]
[unused14]
[unused15]
[unused16]
[unused17]
[unused18]
[unused19]
[unused20]
[unused21]
[unused22]
[unused23]
[unused24]
[unused25]
[unused26]
[unused27]
[unused28]
[unused29]
[unused30]
[unused31]
[unused32]
[unused33]
[unused34]
[unused35]
[unused36]
[unused37]
[unused38]
[unused39]
[unused40]
[unused41]
[unused42]
[unused43]
[unused44]
[unused45]
[unused46]
[unused47]
[unused48]
[unused49]
[unused50]
[unused51]
[unused52]
[unused53]
[unused54]
[unused55]
[unused56]
[unused57]
[unused58]
[unused59]
[unused60]
[unused61]
[unused62]
[unused63]
[unused64]
[unused65]
[unused66]
[unused67]
[unused68]
[unused69]
[unused70]
[unused71]
[unused72]
[unused73]
[unused74]
[unused75]
[unused76]
[unused77]
[unused78]
[unused79]
[unused80]
[unused81]
[unused82]
[unused83]
[unused84]
[unused85]
[unused86]
[unused87]
[unused88]
[unused89]
[u . . .
                                            

Prikaži enostavni zapis vnosa