Prikaži enostavni zapis vnosa

 
dc.contributor.author Plesnik, Emil
dc.contributor.author Morić, Ariana
dc.contributor.author Tovornik, Robert
dc.contributor.author Fabjan, Borut
dc.date.accessioned 2026-02-10T16:00:13Z
dc.date.available 2026-02-10T16:00:13Z
dc.date.issued 2026-02-03
dc.identifier.uri http://hdl.handle.net/11356/2081
dc.description GaMS-Instruct-PHARMA is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain, particularly in the domain of pharmaceutical drugs and their effects. The dataset is based on official Slovene pharmaceutical databases that are publicly accessible on the websites of the Slovenian Database of Medicinal Products (Centralna baza zdravil, https://www.cbz.si) and the Agency for Medicinal Products and Medical Devices of the Republic of Slovenia (Javna agencija Republike Slovenije za zdravila in medicinske pripomočke; JAZMP; https://www.jazmp.si). Version 1.0 contains 482,276 instructions (i.e. prompt-response pairs), which are useful in natural language processing, computational linguistics, and medical informatics. It can be used for research and development projects for fine-tuning language models, training and fine-tuning LLMs for the pharmaceutical domain, developing medical chatbots and assistants in Slovene, supporting pharmaceutical and medical workers in searching information on pharmaceutical drugs, and so on. The dataset consists of two data files: • JSON: GaMS-Instruct-PHARMA_1.0.json (235 MB) - formatted for inspection • JSONL: GaMS-Instruct-PHARMA_1.0.jsonl (210 MB) - optimized for training models Statistics on the dataset are provided in GaMS-Instruct-PHARMA_1.0_dataset_statistics.json. For more information, please consult 00README.txt and the accompanying documentation. Please note that the current version of the dataset does not guarantee full clinical accuracy and may contain errors as a consequence of LLM hallucinations.
dc.language.iso slv
dc.publisher Better, d.o.o.
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/povejmo/
dc.subject instruction following dataset
dc.subject medical texts
dc.subject large language models
dc.subject pharmaceutical texts
dc.title Slovene instruction-following dataset for large language models GaMS-Instruct-PHARMA 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Borut Fabjan info@better.care Better, d.o.o.
sponsor ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 482276 units
files.count 1
files.size 49893207


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
GaMS-Instruct-PHARMA_1.0.zip
Velikost
47.58 MB
Format
application/zip
Opis
JSON + JSONL
MD5
69613d11439e42c87bc4245a1b0a3761
 Prenesi datoteko  Predogled
 Predogled datoteke  
  • GaMS-Instruct-PHARMA_1.0
    • GaMS-Instruct-PHARMA_1.0_docs.pdf242 kB
    • GaMS-Instruct-PHARMA_1.0_dataset_statistics.json787 B
    • GaMS-Instruct-PHARMA_1.0_docs.md16 kB
    • GaMS-Instruct-PHARMA_1.0.jsonl201 MB
    • GaMS-Instruct-PHARMA_1.0_docs.docx36 kB
    • 00README.txt14 kB
    • GaMS-Instruct-PHARMA_1.0.json225 MB

Prikaži enostavni zapis vnosa