Prikaži enostavni zapis vnosa

 
dc.contributor.author Knez, Timotej
dc.contributor.author Prezelj, Tim
dc.contributor.author Žitnik, Slavko
dc.date.accessioned 2023-11-12T13:56:11Z
dc.date.available 2023-11-12T13:56:11Z
dc.date.issued 2023-11-11
dc.identifier.uri http://hdl.handle.net/11356/1894
dc.description Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py. The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895.
dc.language.iso slv
dc.publisher CLARIN.SI
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/TimotejK/SemSex
dc.subject language model
dc.subject education
dc.subject sex ed
dc.subject knowledge extraction
dc.subject natural language processing
dc.title Pretrained models for recognising sex education concepts SemSEX 1.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
files.count 5
files.size 2399406389


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
SemSEX.ttl
Velikost
191.22 KB
Format
Neznano
Opis
SemSex ontology
MD5
86af6728344f1434d537a9aacfabc22c
 Prenesi datoteko
Icon
Ime
concept_classifier_SloBerta.zip
Velikost
363.71 MB
Format
application/zip
Opis
SloBerta based classifier for classifying concepts
MD5
1f8520a17579b1dec5e2f2fec18334b3
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • concept_classifier_SloBerta
    • config.json1 kB
    • training_args.bin2 kB
    • tokenizer_config.json505 B
    • tokenizer.json2 MB
    • special_tokens_map.json298 B
    • pytorch_model.bin422 MB
    • sentencepiece.bpe.model781 kB
Icon
Ime
concept_classifier_CroSloEngual.zip
Velikost
440.71 MB
Format
application/zip
Opis
CroSloEngual BERT based classifier for classifying concepts
MD5
f0340e83590c576ea86d4c5fba712180
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • concept_classifier_CroSloEngual
    • config.json1 kB
    • training_args.bin2 kB
    • tokenizer_config.json370 B
    • tokenizer.json1 MB
    • special_tokens_map.json112 B
    • pytorch_model.bin473 MB
    • vocab.txt321 kB
    • sentencepiece.bpe.model781 kB
Icon
Ime
binary_classifier_SloBerta.zip
Velikost
703.92 MB
Format
application/zip
Opis
SloBerta based classifier for detecting concepts
MD5
4a956f4ec4587806b774367b708dd958
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • binary_classifier_SloBerta
    • sentencepiece.bpe.model781 kB
    • pytorch_model.bin422 MB
    • tokenizer_config.json505 B
    • config.json779 B
    • training_args.bin2 kB
    • model.safetensors387 MB
    • tokenizer.json2 MB
    • vocab.txt321 kB
    • special_tokens_map.json298 B
Icon
Ime
binary_classifier_CroSloEngual.zip
Velikost
779.73 MB
Format
application/zip
Opis
CroSloEngual BERT based classifier for detecting concepts
MD5
16aa8b4744091a201cbeeec679a6336a
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • binary_classifier_CroSloEngual
    • sentencepiece.bpe.model781 kB
    • pytorch_model.bin473 MB
    • tokenizer_config.json370 B
    • config.json701 B
    • training_args.bin2 kB
    • model.safetensors387 MB
    • tokenizer.json1 MB
    • vocab.txt321 kB
    • special_tokens_map.json112 B

Prikaži enostavni zapis vnosa