| dc.contributor.author | Plesnik, Emil |
| dc.contributor.author | Morić, Ariana |
| dc.contributor.author | Tovornik, Robert |
| dc.contributor.author | Fabjan, Borut |
| dc.date.accessioned | 2026-02-10T16:00:13Z |
| dc.date.available | 2026-02-10T16:00:13Z |
| dc.date.issued | 2026-02-03 |
| dc.identifier.uri | http://hdl.handle.net/11356/2081 |
| dc.description | GaMS-Instruct-PHARMA is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain, particularly in the domain of pharmaceutical drugs and their effects. The dataset is based on official Slovene pharmaceutical databases that are publicly accessible on the websites of the Slovenian Database of Medicinal Products (Centralna baza zdravil, https://www.cbz.si) and the Agency for Medicinal Products and Medical Devices of the Republic of Slovenia (Javna agencija Republike Slovenije za zdravila in medicinske pripomočke; JAZMP; https://www.jazmp.si). Version 1.0 contains 482,276 instructions (i.e. prompt-response pairs), which are useful in natural language processing, computational linguistics, and medical informatics. It can be used for research and development projects for fine-tuning language models, training and fine-tuning LLMs for the pharmaceutical domain, developing medical chatbots and assistants in Slovene, supporting pharmaceutical and medical workers in searching information on pharmaceutical drugs, and so on. The dataset consists of two data files: • JSON: GaMS-Instruct-PHARMA_1.0.json (235 MB) - formatted for inspection • JSONL: GaMS-Instruct-PHARMA_1.0.jsonl (210 MB) - optimized for training models Statistics on the dataset are provided in GaMS-Instruct-PHARMA_1.0_dataset_statistics.json. For more information, please consult 00README.txt and the accompanying documentation. Please note that the current version of the dataset does not guarantee full clinical accuracy and may contain errors as a consequence of LLM hallucinations. |
| dc.language.iso | slv |
| dc.publisher | Better, d.o.o. |
| dc.publisher | Faculty of Computer and Information Science, University of Ljubljana |
| dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://www.cjvt.si/povejmo/ |
| dc.subject | instruction following dataset |
| dc.subject | medical texts |
| dc.subject | large language models |
| dc.subject | pharmaceutical texts |
| dc.title | Slovene instruction-following dataset for large language models GaMS-Instruct-PHARMA 1.0 |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Borut Fabjan info@better.care Better, d.o.o. |
| sponsor | ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds |
| sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
| size.info | 482276 units |
| files.count | 1 |
| files.size | 49893207 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- GaMS-Instruct-PHARMA_1.0.zip
- Size
- 47.58 MB
- Format
- application/zip
- Description
- JSON + JSONL
- MD5
- 69613d11439e42c87bc4245a1b0a3761
- GaMS-Instruct-PHARMA_1.0
- GaMS-Instruct-PHARMA_1.0_docs.pdf242 kB
- GaMS-Instruct-PHARMA_1.0_dataset_statistics.json787 B
- GaMS-Instruct-PHARMA_1.0_docs.md16 kB
- GaMS-Instruct-PHARMA_1.0.jsonl201 MB
- GaMS-Instruct-PHARMA_1.0_docs.docx36 kB
- 00README.txt14 kB
- GaMS-Instruct-PHARMA_1.0.json225 MB