| dc.contributor.author | Plesnik, Emil |
| dc.contributor.author | Tovornik, Robert |
| dc.contributor.author | Fabjan, Borut |
| dc.contributor.author | Radnić, Vuk |
| dc.contributor.author | Marjanović, Anđela |
| dc.contributor.author | Korošec, Filip |
| dc.contributor.author | Žabkar, Ines |
| dc.contributor.author | Kuzman, Ema |
| dc.contributor.author | Rigler, Martin |
| dc.contributor.author | Škufca, Lara |
| dc.contributor.author | Satler, Maša |
| dc.date.accessioned | 2026-02-16T16:05:55Z |
| dc.date.available | 2026-02-16T16:05:55Z |
| dc.date.issued | 2026-02-03 |
| dc.identifier.uri | http://hdl.handle.net/11356/2089 |
| dc.description | GaMS-Instruct-MED-Termset is an instruction-following dataset containing 975,060 prompt-response units in Slovene from the medical domain. It focuses on medical terms, with explanations for clinical and patient use and examples of their application. The dataset is based on a set of medical terms obtained from Wikidata, accessible via the Wikidata Query Service (https://query.wikidata.org/). The initial set of terms was compared with the terms in the reference Slovenian Medical Dictionary published on Termania (https://www.termania.net/slovarji/95/slovenski-medicinski-slovar). Only matching terms were selected for further processing. The final set of terms was structured and enriched with descriptions generated using large language models (Azure OpenAI, GPT-4.1). It includes: • Professional descriptions of medical terms and phrases for medical professionals • Popular descriptions of medical terms and phrases for the general public • Conversions between professional and popular descriptions • Synonyms and antonyms for medical terms and phrases The result is a standardized database in an instructional format. It is suitable for use in computational linguistics, natural language processing (NLP), medical informatics, for training and adapting large language models, developing medical chatbots and assistants in Slovene, supporting healthcare professionals in medical terminology, standardizing medical terminology in Slovene, education in the field of medicine, and conversion between professional and colloquial medical language. For more details on the structure of the dataset, please consult 00README.txt. |
| dc.language.iso | slv |
| dc.publisher | Better, d.o.o. |
| dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
| dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://www.cjvt.si/povejmo/ |
| dc.subject | instruction following dataset |
| dc.subject | large language models |
| dc.subject | medical texts |
| dc.subject | medical terminology |
| dc.title | Slovene instruction-following dataset for large language models GaMS-Instruct-MED-Termset 1.0 |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Borut Fabjan info@better.care Better, d.o.o. |
| sponsor | ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds |
| sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
| size.info | 975060 units |
| files.count | 1 |
| files.size | 22081952 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- GaMS-Instruct-MED-Termset_1.0.zip
- Size
- 21.06 MB
- Format
- application/zip
- Description
- JSONL
- MD5
- 5fe40198b3f4b84f74c9ae0af502a06d
- GaMS-Instruct-MED-Termset_1.0
- GaMS-Instruct-MED-Termset_1.0_documentation.pdf219 kB
- GaMS-Instruct-MED-Termset_1.0.jsonl483 MB
- GaMS-Instruct-MED-Termset_1.0_statistics.txt1 kB
- GaMS-Instruct-MED-Termset_1.0_documentation.docx32 kB
- GaMS-Instruct-MED-Termset_1.0_documentation.md17 kB
- 00README.txt14 kB