<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href='static/style.xsl' type='text/xsl'?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-04T07:41:53Z</responseDate><request verb="GetRecord" identifier="oai:www.clarin.si:11356/2050" metadataPrefix="oai_dc">http://www.clarin.si/repository/oai/request</request><GetRecord><record><header><identifier>oai:www.clarin.si:11356/2050</identifier><datestamp>2025-09-23T15:08:25Z</datestamp><setSpec>hdl_11356_1023</setSpec><setSpec>hdl_11356_1024</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Slovenian Dataset for Vision-Language Model Instruction-Tuning SLO-VLM-IT-Dataset 1.0</dc:title>
<dc:creator>Martinc, Matej</dc:creator>
<dc:subject>large language models</dc:subject>
<dc:subject>multimodal</dc:subject>
<dc:subject>vision-language models</dc:subject>
<dc:subject>instruction following dataset</dc:subject>
<dc:description>This entry contains the SLO-VLM-IT-Dataset, a comprehensive dataset designed for instruction-tuning vision-language models in the Slovenian language. It is composed of five main .json files, which together provide a rich and diverse set of examples for training and fine-tuning models to understand and process both visual and textual information in Slovenian.&#xd;
&#xd;
1. llava_v1_5_mix665k_translated_gemini_1_5_pro_all.json&#xd;
This file contains a machine-translated version of the popular Llava_v1_5_mix665k dataset. The translation from English to Slovenian was performed using the proprietary Gemini 1.5 Pro model.&#xd;
&#xd;
2. wiki_14_march_2024_latest.json&#xd;
This file consists of conversational examples generated from Slovenian Wikipedia articles. The proprietary Gemini 1.5 Pro model was utilized for the data curation process, transforming the articles into an instruction-tuning format.&#xd;
&#xd;
3. rtv.json&#xd;
This file consists of conversational examples generated on the basis of images from the news portal https://www.rtvslo.si. The proprietary Gemini 1.5 Pro model was utilized for the data generation.&#xd;
&#xd;
4. siol.json&#xd;
This file consists of conversational examples generated on the basis of images from the news portal https://siol.net. The proprietary Gemini 1.5 Pro model was utilized for the data generation.&#xd;
&#xd;
5. 24ur.json&#xd;
This file consists of conversational examples generated on the basis of images from the news portal https://www.24ur.com. The proprietary Gemini 1.5 Pro model was utilized for the data generation.&#xd;
&#xd;
The combined dataset includes a total of 1,128,228 examples, categorized as follows:&#xd;
&#xd;
21,838 textvqa examples: Instructions for vision question answering based on specific Optical Character Recognition (OCR) tokens.&#xd;
&#xd;
349,369 coco examples: A mix of instructions corresponding to 118,000 images from the COCO 2017 Object Detection Dataset. These include tasks such as generating long image descriptions, providing single-word answers, and answering multiple-choice questions.&#xd;
&#xd;
81,309 vg examples: Instructions to either provide bounding box coordinates for a specified region in an image or describe a region defined by given coordinates.&#xd;
&#xd;
66,227 gqa examples: Instructions requiring a one-word or one-phrase response to a question about the corresponding image.&#xd;
&#xd;
78,976 ocr_vqa examples: Instructions focused on performing OCR to extract text from an image.&#xd;
&#xd;
139,433 wiki examples: Instruction-tuning examples generated from Slovenian Wikipedia articles. The original Wikipedia articles were obtained from a Wikipedia database dump from March 14th 2025.&#xd;
&#xd;
100,000 rtv examples: Instruction-tuning examples generated on the basis of images from the news portal https://www.rtvslo.si. Image scraping was completed on February 7th 2025.&#xd;
&#xd;
100,000 siol examples: Instruction-tuning examples generated on the basis of images from the news portal https://siol.net. Image scraping was completed on March 22nd 2025.&#xd;
&#xd;
100,000 24ur examples: Instruction-tuning examples generated on the basis of images from the news portal https://www.24ur.com. Image scraping was completed on February 7th 2025.&#xd;
&#xd;
Accessing the Corresponding Images&#xd;
&#xd;
News portal Images&#xd;
The images corresponding to the 'rtv', 'siol' and '24ur' examples need to be downloaded from the appropriate news portal. Each example in the json file contains an 'image' key with a URL of the corresponding image.&#xd;
&#xd;
Wiki Images&#xd;
The images corresponding to the 'wiki' examples are available for download at the following link:&#xd;
https://kt-cloud.ijs.si/index.php/s/nbLmWkaJEXHMMwe&#xd;
&#xd;
Llava_v1_5_mix665k Images&#xd;
To facilitate the download of images for the translated Llava_v1_5_mix665k dataset, we provide the necessary Python script get_llava_images.py and its dependency overwatch.py.</dc:description>
<dc:date>2025-09-18</dc:date>
<dc:type>corpus</dc:type>
<dc:identifier>http://hdl.handle.net/11356/2050</dc:identifier>
<dc:language>slv</dc:language>
<dc:rights>Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by-nc/4.0/</dc:rights>
<dc:rights>PUB</dc:rights>
<dc:format>application/zip</dc:format>
<dc:format>text/plain; charset=utf-8</dc:format>
<dc:format>downloadable_files_count: 1</dc:format>
<dc:publisher>Jožef Stefan Institute</dc:publisher>
<dc:source>https://www.cjvt.si/llm4dh/</dc:source>
</oai_dc:dc>
</metadata></record></GetRecord></OAI-PMH>