<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>CLARIN.SI data &amp; tools</title>
<link>http://hdl.handle.net/11356/1024</link>
<description>CLARIN.SI repository language resources and tools</description>
<pubDate>Thu, 16 Apr 2026 21:57:41 GMT</pubDate>
<dc:date>2026-04-16T21:57:41Z</dc:date>
<item>
<title>Slovene Lexicographic QA Fine-Tuning Corpus SloLexQA 1.0</title>
<link>http://hdl.handle.net/11356/2116</link>
<description>Slovene Lexicographic QA Fine-Tuning Corpus SloLexQA 1.0
Knez, Timotej; Žitnik, Slavko
The Slovene Lexicographic QA Fine-Tuning Corpus is a specialized dataset designed to advance the performance of AI models in understanding the structural, grammatical, and semantic nuances of the Slovene language. Comprising over 16,000 question-answer pairs, the corpus shifts away from general knowledge to focus on high-quality lexicographic data, including morphology, lemmatization, and part-of-speech identification. It serves as a critical resource for fine-tuning models to act as sophisticated linguistic assistants.&#13;
&#13;
The dataset integrates diverse sources, ranging from automatically generated content based on the Digital Dictionary Database of Slovene (DDDS) to manual expert advice from the Jezikovna svetovalnica portal. This hybrid approach ensures a robust mix of systematic grammatical queries and nuanced, real-world linguistic explanations. With a significant portion of the data derived from annotated linguistic corpora like SSJ500k, the dataset provides a reliable foundation for training models in both context-free definitions and context-dependent usage scenarios.&#13;
&#13;
Technically, the corpus is structured for high utility in machine learning workflows, featuring a 90/10 training and test split with metadata for each entry. It categorizes questions into specific types such as definitions and usage examples, allowing researchers to perform targeted domain adaptation. By providing clear links between questions and specific lexemes, the corpus enables precise evaluation of a model's ability to navigate the formal rules and practical applications of the Slovene lexicon.
</description>
<pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2116</guid>
<dc:date>2026-04-14T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Tisoč in nobena noč" (A thousand and no nights)</title>
<link>http://hdl.handle.net/11356/2113</link>
<description>Sample from the audiobook "Tisoč in nobena noč" (A thousand and no nights)
Gradišnik, Branko
This entry includes the first part of the audiobook "Tisoč in nobena noč" (A Thousand and No Nights) by author Branko Gradišnik (COBISS.ID: 270943235, ISBN: 978-961-291-533-9). &#13;
&#13;
The world is a wonderful and fascinating place—we know that. But when Branko Gradišnik wanders through it, this fact is revealed to us again and again in new and unexpected ways. And this time is no different—Branko’s foot sets down in Cuba, the island of music and dance and a land of extraordinary beauty and an uncountable number of Olympic champions. Gradišnik intertwines and blends all of this with a solid dose of insights from the fields of ketosis, life analytics, and flirtology, serving up his Cuba as a top-tier travelogue dish, seasoned with countless stories. However, there is something slightly different this time: in Cuba, Branko meets someone who is (almost?) his equal in the ancient art of storytelling and spinning tales. What lesson can we draw from this? Sometimes you have to travel to the other side of the world to meet a truly talkative Styrian. So, meet Cuba à la Branko Gradišnik and Deni Hedžet.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2113</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Srce in kamen: kontemplacija o življenju v gorah" (Heart and Stone: A Contemplation on Life in the Mountains)</title>
<link>http://hdl.handle.net/11356/2114</link>
<description>Sample from the audiobook "Srce in kamen: kontemplacija o življenju v gorah" (Heart and Stone: A Contemplation on Life in the Mountains)
Legragić, Vid
This entry contains the first part of the audiobook "Srce in kamen : kontemplacija o življenju v gorah" (Heart and Stone: A Contemplation on Life in the Mountains) (COBISS.ID: 274410755, ISBN: 978-961-291-535-3).&#13;
&#13;
The book brings together 12 stories, best understood by lovers of the mountains.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2114</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Faktor X" (Factor X)</title>
<link>http://hdl.handle.net/11356/2112</link>
<description>Sample from the audiobook "Faktor X" (Factor X)
Sivec, Ivan
This entry contains the first part of the audiobook "Faktor X" (Factor X) by author Ivan Sivec (COBISS.ID: 273985283, ISBN: 978-961-7143-59-1).&#13;
&#13;
The social-psychological novel "Faktor X" by writer Ivan Sivec, subtitled Confession of a Naive Model, is a tragic story with a happy ending. Almost eighteen-year-old Maša Poglajen considers herself very mature and believes she can take care of herself. When she enters the glamorous world of modeling, it appears so beautiful to her that she is completely blinded by it. Maša soon falls into drug addiction. Photoshoots around the world and high fees repeatedly convince her of the importance of her work. Only after being raped does she realize that the fashion world is not for her, and she returns home. Her family also shows a worrying attitude toward Maša’s career. They consider it right for the girl to capitalize on her beauty on the fashion runways of Milan, London, and New York. The novel can be understood primarily as an important and serious reflection intended for girls who are tempted by such a career.&#13;
For those not interested in such forms of self-affirmation, the work vividly portrays an outwardly beautiful but, behind the scenes, much harsher fashion world and the secret activities controlled mainly by middle-aged bald men.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2112</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Finta v levo" (A Feint to the Left)</title>
<link>http://hdl.handle.net/11356/2111</link>
<description>Sample from the audiobook "Finta v levo" (A Feint to the Left)
Sivec, Ivan
This entry contains the first part of the audiobook "Finta v levo" (A Feint to the Left) by author Iztok Sivec (COBISS.ID: 272646147, ISBN: 978-961-7143-55-3).&#13;
&#13;
The book addresses the current issue of drug use and the drug-dealing and rave scene in our region. The story was told to the writer by the main character, Jure.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2111</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Princ za belem konju" (Knight in Shining Armor)</title>
<link>http://hdl.handle.net/11356/2110</link>
<description>Sample from the audiobook "Princ za belem konju" (Knight in Shining Armor)
Sivec, Ivan
This entry contains the first part of audiobook "Princ na belem konju" (A knight in shining armor) (COBISS.ID: 272746243, ISBN: 978-961-7143-56-0).&#13;
&#13;
This is the tenth story from the Happy Family series, which develops from a melodrama into a love polygon, and after a fatal accident turns into a Karst crime story. &#13;
Love for horses is often even stronger than love for people—and between people. Yet on the Karst terrain, among lovers of these noble animals, a love triangle emerges that is not connected only to horses. In Rdeči Kal, on a secluded estate between the world-famous Lipica and the small town of Sežana, it is precisely because of love that a… fatal accident occurs! The romantic melodrama charges forward at a gallop into a crime story, featuring not only all the members of the Erjavec happy—or rather comical—family, but also Gal, a Lipica trainer, and Štef, a rider from Prlekija.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2110</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Kralj Samo" (King Samo)</title>
<link>http://hdl.handle.net/11356/2109</link>
<description>Sample from the audiobook "Kralj Samo" (King Samo)
Sivec, Ivan
This entry contains the first part of the audiobook "Kralj Samo" (King Samo), by author Iztok Sivec (COBISS.SI-ID 273112067, ISBN 978-961-7143-58-4 (MP3)).&#13;
&#13;
“King Samo – Part I of the Saga of Carantania” (a novel of heroism and love) is an epic tale about the renowned King Samo, the young hero Vitomir, and the restless world they helped shape in the 7th century. With a remarkably broad narrative sweep, the author vividly portrays the lives of pagans imbued with a poetic understanding of nature, love, and solidarity. Their values could serve as a foundation for a truly democratic society. Yet when the time for conflict arrives, we witness strategic tactics that ensure both survival and decades of peace.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2109</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Jutri bom umrl" (I will die tomorrow)</title>
<link>http://hdl.handle.net/11356/2108</link>
<description>Sample from the audiobook "Jutri bom umrl" (I will die tomorrow)
Sivec, Ivan
This entry contains the first part of the audiobook "Jutri bom umrl" (I will die tomorrow), by author Iztok Sivec (COBISS.SI-ID 272746499, ISBN 978-961-7143-57-7 (MP3)).&#13;
&#13;
The novel “Tomorrow I Will Die”, subtitled Confessions of a Ljubljana Playboy, draws attention to the plague of modern times – AIDS. It tells the unusual story of brothers Luka and Jaka, who live life to the fullest without thinking about the future. &#13;
In this life, everything is possible—perhaps even everything forgivable. Yet each person must take care of the development of their own personality and be responsible for their actions. If one truly lives this way, they should not be surprised when injustice happens to them. This applies both to the individual and to society as a whole. In Slovenia, we must not be misled by an apparent calm. The experiences of some nearby countries are far too alarming for complacency. In Tomorrow I Will Die, writer Ivan Sivec addresses this issue in a somewhat unusual, yet vividly realistic and, for our young country, highly necessary and deeply moving way.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2108</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Sample from the audiobook "Vlomilci delajo poleti" (Burglars work during the summer)</title>
<link>http://hdl.handle.net/11356/2104</link>
<description>Sample from the audiobook "Vlomilci delajo poleti" (Burglars work during the summer)
Sivec, Ivan; Grm, Lucija
This entry contains the first part of the audiobook "Vlomilci delajo poleti", by author Iztok Sivec (COBISS.SI-ID 274021123, ISBN 978-961-7143-60-7 (MP3))&#13;
&#13;
“Burglars Work in the Summer” is a holiday detective story in which the Erjavec family goes to the seaside, only to be robbed. Was their money stolen by the cleaner, the plumber, the Italian neighbor, the pizza delivery person, the Montenegrin mafia, a clown, the director’s nephew, or even the police? Meanwhile, the members of the happy—rather comical—Erjavec family also begin to suspect one another.
</description>
<pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2104</guid>
<dc:date>2026-04-13T00:00:00Z</dc:date>
</item>
<item>
<title>Monitor corpus of Slovene Trendi 2026-03</title>
<link>http://hdl.handle.net/11356/2103</link>
<description>Monitor corpus of Slovene Trendi 2026-03
Kosem, Iztok; Čibej, Jaka; Dobrovoljc, Kaja; Erjavec, Tomaž; Ljubešić, Nikola; Ponikvar, Primož; Šinkec, Mihael; Krek, Simon
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 60 publishers. Trendi 2026-02 covers the period from January 2019 to March 2026, complementing the Gigafida 2.2 reference corpus of written Slovene (http://hdl.handle.net/11356/2106).&#13;
&#13;
The contents of the Trendi corpus are obtained using the Jožef Stefan Institute Newsfeed service (http://newsfeed.ijs.si/). The texts have been annotated using the CLASSLA-Stanza pipeline (https://github.com/clarinsi/classla), including syntactic parsing according to the Universal Dependencies (https://universaldependencies.org/sl/) and Named Entities (https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf).&#13;
&#13;
An important addition are topics or thematical categories, which have been automatically assigned to each text. There are 13 categories altogether: Arts and culture, Crime and accidents, Economy, Environment, Health, Leisure, Politics and Law, Science and Technology, Society, Sports, Weather, Entertainment, and Education. The text classification uses the following models: Text classification model SloBERTa-Trendi-Topics 1.0 (http://hdl.handle.net/11356/1709), Text classification model fastText-Trendi-Topics 1.0 (http://hdl.handle.net/11356/1710), and the SloBERTa model (https://huggingface.co/cjvt/sloberta-trendi-topics).&#13;
&#13;
The corpus is currently not available as a downloadable dataset due to copyright restrictions but we hope to make at least some of it available in the near future. The corpus is accessible through CLARIN.SI concordancers. If you would like to use the dataset for research purposes, please contact Iztok Kosem (iztok.kosem@ijs.si).&#13;
&#13;
This version adds texts from March 2026.
</description>
<pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://hdl.handle.net/11356/2103</guid>
<dc:date>2026-04-02T00:00:00Z</dc:date>
</item>
</channel>
</rss>
