Show simple item record

 
dc.contributor.author Šorn, Mojca
dc.contributor.author Cvek, Ana
dc.contributor.author Skubic, Jure
dc.contributor.author Logar, Tamara
dc.contributor.author Zagoranski, Sašo
dc.contributor.author Bratanović, Alen
dc.date.accessioned 2024-10-02T11:20:35Z
dc.date.available 2024-10-02T11:20:35Z
dc.date.issued 2024-09-25
dc.identifier.uri http://hdl.handle.net/11356/1975
dc.description GaMS-Instruct-DH is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional context field, as well as a field in which the source of the information included in the response is listed. The dataset focuses on prompts from the field of digital humanities and museum documentation. Its primary goal is to provide a resource that allows existing large language models already available for the field of digital humanities to be expanded to cover Slovene and other similar, but less-resourced languages (e.g. Bosnian). Version 1.0 include approx. 10,000 prompt-response pairs which were compiled entirely by hand by a team of linguists and experts from the field of digital humanities.
dc.language.iso slv
dc.publisher Institute of Contemporary History
dc.publisher Semantika d.o.o.
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/povejmo/en/project/
dc.subject instruction following dataset
dc.subject large language models
dc.subject digital humanities
dc.subject museum documentation
dc.title Slovene instruction-following dataset for large language models GaMS-Instruct-DH 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Sašo Zagoranski info@semantika.si Semantika d.o.o.
sponsor ARIS (Slovenian Research and Innovation Agency) NOO PoVeJMo research project (Adaptive Natural Language Processing with Large Language Models) nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
size.info 10150 units
files.count 1
files.size 910297


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
GaMS-Instruct-DH_1.0.zip
Size
888.96 KB
Format
application/zip
Description
GaMS-Instruct-DH 1.0 (JSON)
MD5
1bb22354350323575d64341e1b50dea2
 Download file  Preview
 File Preview  

Show simple item record