Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Termset 1.0

Name: Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Termset 1.0
License: https://creativecommons.org/licenses/by/4.0/

Plesnik, Emil; Tovornik, Robert; Fabjan, Borut; Radnić, Vuk; Marjanović, Anđela; Korošec, Filip; Žabkar, Ines; Kuzman, Ema; Rigler, Martin; Škufca, Lara; Satler, Maša

Show simple item record

dc.contributor.author	Plesnik, Emil
dc.contributor.author	Tovornik, Robert
dc.contributor.author	Fabjan, Borut
dc.contributor.author	Radnić, Vuk
dc.contributor.author	Marjanović, Anđela
dc.contributor.author	Korošec, Filip
dc.contributor.author	Žabkar, Ines
dc.contributor.author	Kuzman, Ema
dc.contributor.author	Rigler, Martin
dc.contributor.author	Škufca, Lara
dc.contributor.author	Satler, Maša
dc.date.accessioned	2026-02-16T16:05:55Z
dc.date.available	2026-02-16T16:05:55Z
dc.date.issued	2026-02-03
dc.identifier.uri	http://hdl.handle.net/11356/2089
dc.description	GaMS-Instruct-MED-Termset is an instruction-following dataset containing 975,060 prompt-response units in Slovene from the medical domain. It focuses on medical terms, with explanations for clinical and patient use and examples of their application. The dataset is based on a set of medical terms obtained from Wikidata, accessible via the Wikidata Query Service (https://query.wikidata.org/). The initial set of terms was compared with the terms in the reference Slovenian Medical Dictionary published on Termania (https://www.termania.net/slovarji/95/slovenski-medicinski-slovar). Only matching terms were selected for further processing. The final set of terms was structured and enriched with descriptions generated using large language models (Azure OpenAI, GPT-4.1). It includes: • Professional descriptions of medical terms and phrases for medical professionals • Popular descriptions of medical terms and phrases for the general public • Conversions between professional and popular descriptions • Synonyms and antonyms for medical terms and phrases The result is a standardized database in an instructional format. It is suitable for use in computational linguistics, natural language processing (NLP), medical informatics, for training and adapting large language models, developing medical chatbots and assistants in Slovene, supporting healthcare professionals in medical terminology, standardizing medical terminology in Slovene, education in the field of medicine, and conversion between professional and colloquial medical language. For more details on the structure of the dataset, please consult 00README.txt.
dc.language.iso	slv
dc.publisher	Better, d.o.o.
dc.publisher	Faculty of Computer and Information Science, University of Ljubljana
dc.rights	Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.label	PUB
dc.source.uri	https://www.cjvt.si/povejmo/
dc.subject	instruction following dataset
dc.subject	large language models
dc.subject	medical texts
dc.subject	medical terminology
dc.title	Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Termset 1.0
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Borut Fabjan info@better.care Better, d.o.o.
sponsor	ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds
sponsor	ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info	975060 units
files.count	1
files.size	22081952

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Name: GaMS-Instruct-MED-Termset_1.0.zip
Size: 21.06 MB
Format: application/zip
Description: JSONL
MD5: 5fe40198b3f4b84f74c9ae0af502a06d

Download file Preview

File Preview

GaMS-Instruct-MED-Termset_1.0
- GaMS-Instruct-MED-Termset_1.0_documentation.pdf219 kB
- GaMS-Instruct-MED-Termset_1.0.jsonl483 MB
- GaMS-Instruct-MED-Termset_1.0_statistics.txt1 kB
- GaMS-Instruct-MED-Termset_1.0_documentation.docx32 kB
- GaMS-Instruct-MED-Termset_1.0_documentation.md17 kB
- 00README.txt14 kB

Show simple item record

Files in this item

Partners

Partners

Repository