Show simple item record

 
dc.contributor.author Plesnik, Emil
dc.contributor.author Tovornik, Robert
dc.contributor.author Fabjan, Borut
dc.contributor.author Radnić, Vuk
dc.contributor.author Marjanović, Anđela
dc.contributor.author Korošec, Filip
dc.contributor.author Žabkar, Ines
dc.contributor.author Kuzman, Ema
dc.contributor.author Rigler, Martin
dc.contributor.author Škufca, Lara
dc.contributor.author Satler, Maša
dc.date.accessioned 2026-02-16T16:05:55Z
dc.date.available 2026-02-16T16:05:55Z
dc.date.issued 2026-02-03
dc.identifier.uri http://hdl.handle.net/11356/2089
dc.description GaMS-Instruct-MED-Termset is an instruction-following dataset containing 975,060 prompt-response units in Slovene from the medical domain. It focuses on medical terms, with explanations for clinical and patient use and examples of their application. The dataset is based on a set of medical terms obtained from Wikidata, accessible via the Wikidata Query Service (https://query.wikidata.org/). The initial set of terms was compared with the terms in the reference Slovenian Medical Dictionary published on Termania (https://www.termania.net/slovarji/95/slovenski-medicinski-slovar). Only matching terms were selected for further processing. The final set of terms was structured and enriched with descriptions generated using large language models (Azure OpenAI, GPT-4.1). It includes: • Professional descriptions of medical terms and phrases for medical professionals • Popular descriptions of medical terms and phrases for the general public • Conversions between professional and popular descriptions • Synonyms and antonyms for medical terms and phrases The result is a standardized database in an instructional format. It is suitable for use in computational linguistics, natural language processing (NLP), medical informatics, for training and adapting large language models, developing medical chatbots and assistants in Slovene, supporting healthcare professionals in medical terminology, standardizing medical terminology in Slovene, education in the field of medicine, and conversion between professional and colloquial medical language. For more details on the structure of the dataset, please consult 00README.txt.
dc.language.iso slv
dc.publisher Better, d.o.o.
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/povejmo/
dc.subject instruction following dataset
dc.subject large language models
dc.subject medical texts
dc.subject medical terminology
dc.title Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Termset 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Borut Fabjan info@better.care Better, d.o.o.
sponsor ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 975060 units
files.count 1
files.size 22081952


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
GaMS-Instruct-MED-Termset_1.0.zip
Size
21.06 MB
Format
application/zip
Description
JSONL
MD5
5fe40198b3f4b84f74c9ae0af502a06d
 Download file  Preview
 File Preview  
  • GaMS-Instruct-MED-Termset_1.0
    • GaMS-Instruct-MED-Termset_1.0_documentation.pdf219 kB
    • GaMS-Instruct-MED-Termset_1.0.jsonl483 MB
    • GaMS-Instruct-MED-Termset_1.0_statistics.txt1 kB
    • GaMS-Instruct-MED-Termset_1.0_documentation.docx32 kB
    • GaMS-Instruct-MED-Termset_1.0_documentation.md17 kB
    • 00README.txt14 kB

Show simple item record