Show simple item record

 
dc.contributor.author Knez, Timotej
dc.contributor.author Prezelj, Tim
dc.contributor.author Žitnik, Slavko
dc.date.accessioned 2023-11-12T13:56:11Z
dc.date.available 2023-11-12T13:56:11Z
dc.date.issued 2023-11-11
dc.identifier.uri http://hdl.handle.net/11356/1894
dc.description Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py. The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895.
dc.language.iso slv
dc.publisher CLARIN.SI
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/TimotejK/SemSex
dc.subject language model
dc.subject education
dc.subject sex ed
dc.subject knowledge extraction
dc.subject natural language processing
dc.title Pretrained models for recognising sex education concepts SemSEX 1.0
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
files.count 5
files.size 2399406389


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
SemSEX.ttl
Size
191.22 KB
Format
Unknown
Description
SemSex ontology
MD5
86af6728344f1434d537a9aacfabc22c
 Download file
Icon
Name
concept_classifier_SloBerta.zip
Size
363.71 MB
Format
application/zip
Description
SloBerta based classifier for classifying concepts
MD5
1f8520a17579b1dec5e2f2fec18334b3
 Download file  Preview
 File Preview  
  • concept_classifier_SloBerta
    • config.json1 kB
    • training_args.bin2 kB
    • tokenizer_config.json505 B
    • tokenizer.json2 MB
    • special_tokens_map.json298 B
    • pytorch_model.bin422 MB
    • sentencepiece.bpe.model781 kB
Icon
Name
concept_classifier_CroSloEngual.zip
Size
440.71 MB
Format
application/zip
Description
CroSloEngual BERT based classifier for classifying concepts
MD5
f0340e83590c576ea86d4c5fba712180
 Download file  Preview
 File Preview  
  • concept_classifier_CroSloEngual
    • config.json1 kB
    • training_args.bin2 kB
    • tokenizer_config.json370 B
    • tokenizer.json1 MB
    • special_tokens_map.json112 B
    • pytorch_model.bin473 MB
    • vocab.txt321 kB
    • sentencepiece.bpe.model781 kB
Icon
Name
binary_classifier_SloBerta.zip
Size
703.92 MB
Format
application/zip
Description
SloBerta based classifier for detecting concepts
MD5
4a956f4ec4587806b774367b708dd958
 Download file  Preview
 File Preview  
  • binary_classifier_SloBerta
    • sentencepiece.bpe.model781 kB
    • pytorch_model.bin422 MB
    • tokenizer_config.json505 B
    • config.json779 B
    • training_args.bin2 kB
    • model.safetensors387 MB
    • tokenizer.json2 MB
    • vocab.txt321 kB
    • special_tokens_map.json298 B
Icon
Name
binary_classifier_CroSloEngual.zip
Size
779.73 MB
Format
application/zip
Description
CroSloEngual BERT based classifier for detecting concepts
MD5
16aa8b4744091a201cbeeec679a6336a
 Download file  Preview
 File Preview  
  • binary_classifier_CroSloEngual
    • sentencepiece.bpe.model781 kB
    • pytorch_model.bin473 MB
    • tokenizer_config.json370 B
    • config.json701 B
    • training_args.bin2 kB
    • model.safetensors387 MB
    • tokenizer.json1 MB
    • vocab.txt321 kB
    • special_tokens_map.json112 B

Show simple item record