Pretrained models for recognising sex education concepts SemSEX 1.0

Name: Pretrained models for recognising sex education concepts SemSEX 1.0
License: https://creativecommons.org/licenses/by/4.0/

Knez, Timotej; Prezelj, Tim; Žitnik, Slavko

Show simple item record

dc.contributor.author	Knez, Timotej
dc.contributor.author	Prezelj, Tim
dc.contributor.author	Žitnik, Slavko
dc.date.accessioned	2023-11-12T13:56:11Z
dc.date.available	2023-11-12T13:56:11Z
dc.date.issued	2023-11-11
dc.identifier.uri	http://hdl.handle.net/11356/1894
dc.description	Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py. The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895.
dc.language.iso	slv
dc.publisher	CLARIN.SI
dc.rights	Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.label	PUB
dc.source.uri	https://github.com/TimotejK/SemSex
dc.subject	language model
dc.subject	education
dc.subject	sex ed
dc.subject	knowledge extraction
dc.subject	natural language processing
dc.title	Pretrained models for recognising sex education concepts SemSEX 1.0
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor	Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
files.count	5
files.size	2399406389

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Name: SemSEX.ttl
Size: 191.22 KB
Format: Unknown
Description: SemSex ontology
MD5: 86af6728344f1434d537a9aacfabc22c

Download file

Name: concept_classifier_SloBerta.zip
Size: 363.71 MB
Format: application/zip
Description: SloBerta based classifier for classifying concepts
MD5: 1f8520a17579b1dec5e2f2fec18334b3

Download file Preview

File Preview

concept_classifier_SloBerta
- config.json1 kB
- training_args.bin2 kB
- tokenizer_config.json505 B
- tokenizer.json2 MB
- special_tokens_map.json298 B
- pytorch_model.bin422 MB
- sentencepiece.bpe.model781 kB

Name: concept_classifier_CroSloEngual.zip
Size: 440.71 MB
Format: application/zip
Description: CroSloEngual BERT based classifier for classifying concepts
MD5: f0340e83590c576ea86d4c5fba712180

Download file Preview

File Preview

concept_classifier_CroSloEngual
- config.json1 kB
- training_args.bin2 kB
- tokenizer_config.json370 B
- tokenizer.json1 MB
- special_tokens_map.json112 B
- pytorch_model.bin473 MB
- vocab.txt321 kB
- sentencepiece.bpe.model781 kB

Name: binary_classifier_SloBerta.zip
Size: 703.92 MB
Format: application/zip
Description: SloBerta based classifier for detecting concepts
MD5: 4a956f4ec4587806b774367b708dd958

Download file Preview

File Preview

binary_classifier_SloBerta
- sentencepiece.bpe.model781 kB
- pytorch_model.bin422 MB
- tokenizer_config.json505 B
- config.json779 B
- training_args.bin2 kB
- model.safetensors387 MB
- tokenizer.json2 MB
- vocab.txt321 kB
- special_tokens_map.json298 B

Name: binary_classifier_CroSloEngual.zip
Size: 779.73 MB
Format: application/zip
Description: CroSloEngual BERT based classifier for detecting concepts
MD5: 16aa8b4744091a201cbeeec679a6336a

Download file Preview

File Preview

binary_classifier_CroSloEngual
- sentencepiece.bpe.model781 kB
- pytorch_model.bin473 MB
- tokenizer_config.json370 B
- config.json701 B
- training_args.bin2 kB
- model.safetensors387 MB
- tokenizer.json1 MB
- vocab.txt321 kB
- special_tokens_map.json112 B

Show simple item record

Files in this item

Partners

Partners

Repository