dc.contributor.author | Knez, Timotej |
dc.contributor.author | Prezelj, Tim |
dc.contributor.author | Žitnik, Slavko |
dc.date.accessioned | 2023-11-12T13:56:11Z |
dc.date.available | 2023-11-12T13:56:11Z |
dc.date.issued | 2023-11-11 |
dc.identifier.uri | http://hdl.handle.net/11356/1894 |
dc.description | Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py. The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895. |
dc.language.iso | slv |
dc.publisher | CLARIN.SI |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/TimotejK/SemSex |
dc.subject | language model |
dc.subject | education |
dc.subject | sex ed |
dc.subject | knowledge extraction |
dc.subject | natural language processing |
dc.title | Pretrained models for recognising sex education concepts SemSEX 1.0 |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Timotej Knez timotej.knez@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
files.count | 5 |
files.size | 2399406389 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- SemSEX.ttl
- Size
- 191.22 KB
- Format
- Unknown
- Description
- SemSex ontology
- MD5
- 86af6728344f1434d537a9aacfabc22c

- Name
- concept_classifier_SloBerta.zip
- Size
- 363.71 MB
- Format
- application/zip
- Description
- SloBerta based classifier for classifying concepts
- MD5
- 1f8520a17579b1dec5e2f2fec18334b3
- concept_classifier_SloBerta
- config.json1 kB
- training_args.bin2 kB
- tokenizer_config.json505 B
- tokenizer.json2 MB
- special_tokens_map.json298 B
- pytorch_model.bin422 MB
- sentencepiece.bpe.model781 kB

- Name
- concept_classifier_CroSloEngual.zip
- Size
- 440.71 MB
- Format
- application/zip
- Description
- CroSloEngual BERT based classifier for classifying concepts
- MD5
- f0340e83590c576ea86d4c5fba712180
- concept_classifier_CroSloEngual
- config.json1 kB
- training_args.bin2 kB
- tokenizer_config.json370 B
- tokenizer.json1 MB
- special_tokens_map.json112 B
- pytorch_model.bin473 MB
- vocab.txt321 kB
- sentencepiece.bpe.model781 kB

- Name
- binary_classifier_SloBerta.zip
- Size
- 703.92 MB
- Format
- application/zip
- Description
- SloBerta based classifier for detecting concepts
- MD5
- 4a956f4ec4587806b774367b708dd958
- binary_classifier_SloBerta
- sentencepiece.bpe.model781 kB
- pytorch_model.bin422 MB
- tokenizer_config.json505 B
- config.json779 B
- training_args.bin2 kB
- model.safetensors387 MB
- tokenizer.json2 MB
- vocab.txt321 kB
- special_tokens_map.json298 B

- Name
- binary_classifier_CroSloEngual.zip
- Size
- 779.73 MB
- Format
- application/zip
- Description
- CroSloEngual BERT based classifier for detecting concepts
- MD5
- 16aa8b4744091a201cbeeec679a6336a
- binary_classifier_CroSloEngual
- sentencepiece.bpe.model781 kB
- pytorch_model.bin473 MB
- tokenizer_config.json370 B
- config.json701 B
- training_args.bin2 kB
- model.safetensors387 MB
- tokenizer.json1 MB
- vocab.txt321 kB
- special_tokens_map.json112 B