Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Anatomy 1.0

Name: Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Anatomy 1.0
License: https://creativecommons.org/licenses/by/4.0/

Plesnik, Emil; Tovornik, Robert; Fabjan, Borut; Korošec, Filip; Žabkar, Ines; Kuzman, Ema; Rigler, Martin; Škufca, Lara

Show simple item record

dc.contributor.author	Plesnik, Emil
dc.contributor.author	Tovornik, Robert
dc.contributor.author	Fabjan, Borut
dc.contributor.author	Korošec, Filip
dc.contributor.author	Žabkar, Ines
dc.contributor.author	Kuzman, Ema
dc.contributor.author	Rigler, Martin
dc.contributor.author	Škufca, Lara
dc.date.accessioned	2026-02-16T16:03:19Z
dc.date.available	2026-02-16T16:03:19Z
dc.date.issued	2026-02-03
dc.identifier.uri	http://hdl.handle.net/11356/2085
dc.description	GaMS-Instruct-MED-Anatomy is an instruction-following dataset containing 711,805 prompt-response units in Slovene (with English and Latin terminology). The units form a structured, machine-readable database of Slovenian anatomical terminology for training language models. The collection is based on anatomical data collected, translated and validated by medical experts. The data was processed, structured and enriched with automatic scripts and explanations generated using large language models. It includes: • Anatomical terminology in Slovene, English and Latin • SNOMED CT classification (standardized medical coding system) • Classification by body systems • Synonyms and alternative terms (original and generated) • Popular explanations of anatomical structures for the general public • Expert explanations of anatomical structures for medical experts The result is a standardized database in an instructional format that is suitable for use in computational linguistics, natural language processing (NLP), medical informatics and for training and adapting large language models. The corpus is intended for research and development of fine-tuning language models, training and adapting large language models for the medical and anatomical domains, development of medical chatbots and assistants in Slovene, support for healthcare professionals in anatomical terminology, translation of medical documentation, standardization of medical terminology in Slovene, and education in the field of anatomy. For more details on the structure of the dataset, please consult 00README.txt.
dc.language.iso	slv
dc.publisher	Better, d.o.o.
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.rights	Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.label	PUB
dc.source.uri	https://www.cjvt.si/povejmo/
dc.subject	instruction following dataset
dc.subject	medical texts
dc.subject	large language models
dc.subject	anatomy
dc.title	Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Anatomy 1.0
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Borut Fabjan info@better.care Better, d.o.o.
sponsor	ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info	711805 units
files.count	1
files.size	30409379

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Name: GaMS-Instruct-MED-Anatomy_1.0.zip
Size: 29 MB
Format: application/zip
Description: JSON + JSONL
MD5: 46b8b74cfdbe967852d3bf233cdafd76

Download file Preview

File Preview

GaMS-Instruct-MED-Anatomy_1.0
- GaMS-Instruct-MED-Anatomy_1.0_documentation.md20 kB
- GaMS-Instruct-MED-Anatomy_1.0_documentation.pdf243 kB
- GaMS-Instruct-MED-Anatomy_1.0_statistics.json3 kB
- GaMS-Instruct-MED-Anatomy_1.0_documentation.docx48 kB
- GaMS-Instruct-MED-Anatomy_1.0.jsonl340 MB
- GaMS-Instruct-MED-Anatomy_1.0.json380 MB
- 00README.txt15 kB

Show simple item record

Files in this item

Partners

Partners

Repository