{"id":8202,"date":"2025-05-29T09:42:12","date_gmt":"2025-05-29T09:42:12","guid":{"rendered":"https:\/\/www.clarin.si\/info\/?page_id=8202"},"modified":"2025-08-20T07:20:29","modified_gmt":"2025-08-20T07:20:29","slug":"llms4ssh-clarin-k-centre-for-large-language-models-in-ssh","status":"publish","type":"page","link":"https:\/\/www.clarin.si\/info\/k-centres\/llms4ssh-clarin-k-centre-for-large-language-models-in-ssh\/","title":{"rendered":"LLMs4SSH: Knowledge Centre for Large Language Models in SS&#038;H"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">CLARIN.SI is a member of <\/span><a href=\"https:\/\/llms4ssh.clarin-pl.eu\/\" target=\"_blank\" rel=\"noopener\"><b>LLMs4SSH<\/b>, the CLARIN K-centre for Large Language Models for Social Sciences and Humanities<\/a><span style=\"font-weight: 400;\">. The LLMs4SSH Centre offers expertise on various applications of LLMs in processing language data and on expansion and adaptation of LLMs to the needs of researchers from Social Sciences and Humanities.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On this page, we provide the key information on current activities related to large language models (LLMs) in Slovenia.<\/span><\/p>\n\n<h2><strong>Key Projects Focusing on LLMs<\/strong><\/h2>\n<ul>\n<li><a href=\"https:\/\/www.cjvt.si\/povejmo\/en\/project\/\" target=\"_blank\" rel=\"noopener\"><b>PoVeJMo (Adaptive Natural Language Processing with Large Language Models)<\/b><\/a><span style=\"font-weight: 400;\">: This national project is developing the first large language models specifically tailored to the Slovenian language. The resulting models are openly available as <\/span><a href=\"https:\/\/huggingface.co\/collections\/cjvt\/gams-680a34e63dc760cd6fdc604c\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">GaMS models<\/span><\/a><span style=\"font-weight: 400;\">. Inside the project, they will serve as the foundation for advanced applications in the fields of medicine, humanities, industrial environments, and software development.<\/span><\/li>\n<li><a href=\"https:\/\/www.cjvt.si\/llm4dh\/en\" target=\"_blank\" rel=\"noopener\"><b>LLM4DH (Large Language Models for Digital Humanities)<\/b><\/a><span style=\"font-weight: 400;\">: This national project focuses on extensive evaluation and benchmarking of LLMs for Slovenian, their application to research in humanities fields (linguistics and lexicography, education, contemporary history, folkloristics, and law), and development of visual LLMs for Slovenian.<\/span><\/li>\n<li><a href=\"https:\/\/cordis.europa.eu\/project\/id\/101186647\" target=\"_blank\" rel=\"noopener\"><b>AI4DH (Centre of Excellence in Artificial Intelligence for Digital Humanities)<\/b><\/a><span style=\"font-weight: 400;\">: This EU-funded project aims to establish the University of Ljubljana (Slovenia) as a leading institution in Europe for AI applications in digital humanities (DH). The project will set up a Centre of Excellence that combines top-tier AI research with support for DH scholars, enhancing their ability to leverage AI opportunities.<\/span><\/li>\n<li><a href=\"https:\/\/alt-edic.eu\/projects\/alt-edic4eu\/\" target=\"_blank\" rel=\"noopener\"><b>ALT-EDIC4EU (Alliance for Language Technologies for the European Union)<\/b><\/a><span style=\"font-weight: 400;\">: This EU-funded project aims to facilitate the development of a robust and scalable infrastructure and operations for the Alliance for Language Technologies (ALT-EDIC), in order to support the federation of the European Language Technology ecosystem. The project involves experts, institutions and industries from strategic domains, including the Jo\u017eef Stefan Institute from Slovenia.\u00a0<\/span><\/li>\n<li aria-level=\"1\"><a href=\"https:\/\/alt-edic.eu\/projects\/llms4eu\/\" target=\"_blank\" rel=\"noopener\"><b>LLMs4EU (Large Language Models for the European Union)<\/b><\/a><span style=\"font-weight: 400;\">: This EU-funded project aims to establish a one-stop shop for language data to generate value for developers of LLMs, a cutting-edge platform for the transparent evaluation and benchmarking of LLMs in European languages, and to develop language models tailored to specific languages, sectors, and use cases from diverse application domains (energy, telecom, tourism, public services, and science).<\/span> <span style=\"font-weight: 400;\">The project is carried out by a broad consortium of leading research centres and companies specializing in language data management, LLMs and language technologies, with some of the core partners coming from Slovenia.<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><strong>Benchmarks for LLMs in Slovenian<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">The following benchmarks enable evaluation of large language models in Slovenian language:<\/span><\/p>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/slobench.cjvt.si\/\" target=\"_blank\" rel=\"noopener\"><b>SloBENCH (Slovenian NLP Benchmark)<\/b><\/a><span style=\"font-weight: 400;\">:<\/span> <span style=\"font-weight: 400;\">This evaluation platform enables benchmarking the Slovenian natural language processing technologies on the following tasks: natural language inference (NLI), machine translation (between English and Slovenian), speech recognition (ASR), named entity recognition (NER) and dependency parsing. It also includes the Slovenian Winograd Schema Challenge (WSC) dataset and SuperGLUE benchmarks.<\/span><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/huggingface.co\/datasets\/cjvt\/slovenian-llm-eval\" target=\"_blank\" rel=\"noopener\"><b>Slovenian LLM eval<\/b><\/a><span style=\"font-weight: 400;\">: Set of benchmarks (ARC Challenge, ARC Easy, BoolQ, HellaSwag, NQ Open, OpenBookQA, PIQA, TriviaQA, Winogrande) for evaluating Slovenian language models, building upon the work of Aleksa Gordi\u0107 who translated some of the popular English benchmarks into Slovenian via machine translation. The authors have further improved the quality of these automatic Slovenian translations.<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><strong>Benchmarks and Datasets for LLM evaluation in South Slavic Languages<\/strong><\/h2>\n<p><a href=\"http:\/\/clarin.si\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLARIN.SI<\/span><\/a><span style=\"font-weight: 400;\"> also works intensively on various South Slavic languages via its <\/span><a href=\"http:\/\/www.clarin.si\/info\/k-centre\/\" target=\"_blank\" rel=\"noopener\"><b>CLASSLA Knowledge Centre for South Slavic languages<\/b><\/a><span style=\"font-weight: 400;\">. As part of that work, a series of benchmarks have been developed for numerous languages. We list the most prominent ones:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/benchich\" target=\"_blank\" rel=\"noopener\"><b>BENCHi\u0107 benchmarking platform<\/b><\/a><span style=\"font-weight: 400;\"> for Croatian, Serbian, Bosnian, and Macedonian, covering named entity recognition (NER), sentiment identification, commonsense reasoning and language identification.<\/span><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/dialect-copa\" target=\"_blank\" rel=\"noopener\"><b>DIALECT-COPA<\/b><\/a><span style=\"font-weight: 400;\">: commonsense reasoning in South Slavic languages and dialects (Slovenian, Cerkno, Croatian, Chakavian, Serbian, Torlak, Macedonian)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/TajaKuzman\/IPTC-Media-Topic-Classification\" target=\"_blank\" rel=\"noopener\"><b>IPTC news topic classification<\/b><\/a><span style=\"font-weight: 400;\"> (Slovenian, Croatian, Greek, Catalan)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/TajaKuzman\/AGILE-Automatic-Genre-Identification-Benchmark\" target=\"_blank\" rel=\"noopener\"><b>AGILE benchmark on text genre identification<\/b><\/a><span style=\"font-weight: 400;\"> (Slovenian, Croatian, Macedonian, English, Albanian, Catalan, Greek, Icelandic, Maltese, Turkish, and Ukrainian)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/orgs\/UniversalNER\/repositories\" target=\"_blank\" rel=\"noopener\"><b>UniversalNER benchmark<\/b><\/a><span style=\"font-weight: 400;\"> for many languages, including Croatian and Serbian<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/huggingface.co\/datasets\/classla\/ParlaSent\" target=\"_blank\" rel=\"noopener\"><b>ParlaSent sentiment identification dataset in parliamentary debates<\/b><\/a><span style=\"font-weight: 400;\"> (Slovenian, Croatian, Bosnian, Serbian, Czech, Slovak, English)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/clarinsi.github.io\/parlaspeech\/\" target=\"_blank\" rel=\"noopener\"><b>ParlaPause benchmark on filled pause detection in speech<\/b><\/a><span style=\"font-weight: 400;\"> (Slovenian, Croatian, Serbian, Czech, Polish)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/huggingface.co\/datasets\/classla\/mak_na_konac\" target=\"_blank\" rel=\"noopener\"><b>Mak Na Konac automatic speech recognition benchmark<\/b><\/a><span style=\"font-weight: 400;\"> for Croatian and Serbian<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/huggingface.co\/datasets\/classla\/Mici_Princ\" target=\"_blank\" rel=\"noopener\"><b>Mi\u0107i Princ automatic speech recognition benchmark<\/b><\/a><span style=\"font-weight: 400;\"> for the Chakavian dialect of Croatian<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">For an overview of freely-available datasets, including general text collections, and training and test datasets for various NLP tasks, see the Frequently-Asked Questions (FAQ) for <\/span><a href=\"http:\/\/www.clarin.si\/info\/k-centre\/faq4slovene\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Slovenian<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/www.clarin.si\/info\/k-centre\/faq4croatian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Croatian,<\/span><\/a> <a href=\"http:\/\/www.clarin.si\/info\/k-centre\/faq4serbian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Serbian<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/www.clarin.si\/info\/k-centre\/faq4bulgarian\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Bulgarian<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/www.clarin.si\/info\/k-centre\/faq4macedonian\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Macedonian<\/span><\/a><span style=\"font-weight: 400;\"> language, curated by the <\/span><a href=\"http:\/\/www.clarin.si\/info\/k-centre\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLASSLA Knowledge Centre<\/span><\/a><span style=\"font-weight: 400;\">. The FAQ also provides information about resources and technologies for linguistic annotation of South Slavic texts.<\/span><\/p>\n<h2><strong>Large Language Models and Other Language Technologies for Slovenian<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">The main sites where you can find open-source large language models and language technologies for Slovenian are:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/www.clarin.si\/repository\/xmlui\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLARIN.SI repository<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/huggingface.co\/cjvt\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CJVT organization profile at Hugging Face<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/huggingface.co\/classla\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLASSLA organization profile at Hugging Face<\/span><\/a><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The <\/span><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">overview of openly-available large language models, speech technologies, and other natural-language processing (NLP) technologies for Slovenian language<\/span><\/a><span style=\"font-weight: 400;\">, curated by the CLASSLA Knowledge Centre, provides information on openly-available technologies for Slovenian, benchmarks and papers on:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#generative-models-llms-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">generative models (LLMs)<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#embedding-models--rag-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">embedding models &amp; RAG<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#machine-translation-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">machine translation systems<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#bert-like-pretrained-models-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">BERT-like pretrained models<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#fine-tuned-models-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">fine-tuned models for Slovenian<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#speech-technologies-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">speech technologies for Slovenian<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/clarinsi\/Slovenian-Language-Technologies-Overview\/#other-language-technologies-for-slovenian\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">other language technologies for Slovenian<\/span><\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><strong>Contact Us and Stay Updated<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">If you have any questions related to large language models, language technologies or language resources, the CLASSLA Knowledge Centre has a <\/span><b>helpdesk<\/b><span style=\"font-weight: 400;\"> dedicated to these topics for South Slavic languages. It can be contacted via <\/span><b>helpdesk.classla@clarin.si<\/b><span style=\"font-weight: 400;\">. The helpdesk offers additional clarifications regarding the CLASSLA documentation and support in using, modifying, producing, or publishing resources and technologies for South Slavic languages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can <\/span><a href=\"https:\/\/mailman.ijs.si\/mailman\/listinfo\/classla\" target=\"_blank\" rel=\"noopener\"><b>subscribe to the CLASSLA mailing list here<\/b><\/a><span style=\"font-weight: 400;\"> to be informed of new resources, technologies, events and projects for Slovenian and other South Slavic languages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Stay updated on the latest activities of the CLASSLA Knowledge Centre and the CLARIN.SI infrastructure which are the Slovenian members of the LLMs4SSH Knowledge Centre by following:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">CLARIN.SI on <\/span><a href=\"https:\/\/x.com\/ClarinSlovenia\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">X<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/www.linkedin.com\/company\/clarin-si\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">LinkedIn<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the <\/span><a href=\"https:\/\/discord.com\/invite\/vQDRpGMU7C\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Discord group \u201cSlovenska skupnost za jezikovne vire in tehnologije\u201d<\/span><\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><strong>The CLARIN.SI Team in LLMs4SSH<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Main CLARIN.SI members that participate in LLMs4SSH Knowledge Centre are Simon Krek, Nikola Ljube\u0161i\u0107, \u0160pela Vintar, and Taja Kuzman Punger\u0161ek.<\/span><\/p>\n<p><a href=\"https:\/\/www.simonkrek.si\/en\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Dr Simon Krek<\/span><\/a><span style=\"font-weight: 400;\"> is a researcher from the <\/span><a href=\"http:\/\/ailab.ijs.si\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Department for Artificial Intelligence at the Jo\u017eef Stefan Institute<\/span><\/a><span style=\"font-weight: 400;\"> and the head of the <\/span><a href=\"https:\/\/www.cjvt.si\/en\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Centre for Language Resources and Technologies (CJVT) at the University of Ljubljana<\/span><\/a><span style=\"font-weight: 400;\">, Slovenia. His research fields are lexicography and lexicogrammar, corpus linguistics, natural language processing, language technology infrastructure and computer-aided language learning and teaching. He has coordinated major Slovenian projects for language technologies (cf. <\/span><a href=\"http:\/\/eng.slovenscina.eu\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Communication in Slovene<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/rsdo.slovenscina.eu\/en\/about-project\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Development of Slovene in a Digital Environment<\/span><\/a><span style=\"font-weight: 400;\">). In addition to participating in numerous European projects (<\/span><a href=\"http:\/\/www.elda.org\/en\/projects\/archived-projects\/meta-net\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">META-NET<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/xlike.ijs.si\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">xLike<\/span><\/a><span style=\"font-weight: 400;\"> and others), he has led the H2020-funded <\/span><a href=\"http:\/\/www.elex.is\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">ELEXIS<\/span><\/a><span style=\"font-weight: 400;\"> project (European Lexicographic Infrastructure). He is currently leading the <\/span><a href=\"https:\/\/www.cjvt.si\/povejmo\/en\/project\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">PoVeJMo<\/span><\/a><span style=\"font-weight: 400;\"> project that is focused on developing large language models for Slovenian, and is involved in several other major projects related to language technologies and large language models, including the <\/span><a href=\"https:\/\/mezzanine.um.si\/en\/mezzanine-english\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">MEZZANINE<\/span><\/a><span style=\"font-weight: 400;\"> project for Slovenian speech technologies, <\/span><a href=\"https:\/\/www.cjvt.si\/llm4dh\/en\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">LLM4DH<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/alt-edic.eu\/projects\/alt-edic4eu\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">ALT-EDIC4EU<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/alt-edic.eu\/projects\/llms4eu\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">LLMs4EU<\/span><\/a><span style=\"font-weight: 400;\"> projects. He also serves as a deputy national coordinator of the <\/span><a href=\"https:\/\/www.clarin.si\/info\/about\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLARIN.SI<\/span><\/a><span style=\"font-weight: 400;\"> infrastructure.<\/span><\/p>\n<p><a href=\"https:\/\/nljubesi.github.io\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Dr Nikola Ljube\u0161i\u0107<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/taja-kuzman.notion.site\/Taja-Kuzman-8fdda29e5968470286b57421984ed21d\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Taja Kuzman Punger\u0161ek<\/span><\/a><span style=\"font-weight: 400;\"> are researchers from the <\/span><a href=\"https:\/\/kt.ijs.si\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Department of Knowledge Technologies at the Jo\u017eef Stefan Institute<\/span><\/a><span style=\"font-weight: 400;\">, Slovenia. Their research interests cover a broad spectrum of natural language processing (NLP) tasks including web corpus construction (cf., <\/span><a href=\"https:\/\/macocu.eu\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">MaCoCu<\/span><\/a><span style=\"font-weight: 400;\"> project and <\/span><a href=\"https:\/\/aclanthology.org\/2024.lrec-main.291\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLASSLA-web<\/span><\/a><span style=\"font-weight: 400;\"> corpora), development of tools for automatic linguistic annotation for South Slavic languages (cf. <\/span><a href=\"https:\/\/github.com\/clarinsi\/classla\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLASSLA-Stanza pipeline<\/span><\/a><span style=\"font-weight: 400;\">), specialization of language models for under-resourced languages (cf., <\/span><a href=\"https:\/\/huggingface.co\/classla\/bcms-bertic\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">BERTi\u0107<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/huggingface.co\/classla\/xlm-r-bertic\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">XLM-R-BERTi\u0107<\/span><\/a><span style=\"font-weight: 400;\"> models), development of speech corpora and automatic speech recognition models (cf. <\/span><a href=\"https:\/\/clarinsi.github.io\/parlaspeech\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">ParlaSpeech<\/span><\/a><span style=\"font-weight: 400;\"> corpora and <\/span><a href=\"https:\/\/mezzanine.um.si\/en\/mezzanine-english\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">MEZZANINE<\/span><\/a><span style=\"font-weight: 400;\"> project), <\/span><a href=\"https:\/\/github.com\/clarinsi\/benchich\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">benchmarking South Slavic NLP technologies<\/span><\/a><span style=\"font-weight: 400;\">, application of NLP methods to South Slavic dialects (cf. <\/span><a href=\"https:\/\/sites.google.com\/view\/vardial-2024\/shared-tasks\/dialect-copa\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">VarDial DIALECT-COPA<\/span><\/a><span style=\"font-weight: 400;\"> shared task), and machine learning tasks, including hate speech detection (cf. <\/span><a href=\"http:\/\/imsypp.ijs.si\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">IMSyPP<\/span><\/a><span style=\"font-weight: 400;\"> project), <\/span><a href=\"https:\/\/www.clarin.si\/repository\/xmlui\/handle\/11356\/1681\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">topic detection<\/span><\/a><span style=\"font-weight: 400;\">, and <\/span><a href=\"https:\/\/www.mdpi.com\/2504-4990\/5\/3\/59\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">automatic genre identification<\/span><\/a><span style=\"font-weight: 400;\">. They are the leaders of the <\/span><a href=\"https:\/\/www.clarin.si\/info\/k-centre\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLASSLA centre (CLARIN Knowledge Centre for South Slavic languages)<\/span><\/a><span style=\"font-weight: 400;\">, which offers expertise on language resources and technologies for South Slavic languages, and members of <\/span><a href=\"http:\/\/clarin.si\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CLARIN.SI<\/span><\/a><span style=\"font-weight: 400;\"> management committee.<\/span><\/p>\n<p><a href=\"http:\/\/www.lojze.si\/spela\/\" target=\"_blank\" rel=\"noopener\">Dr \u0160pela Vintar<\/a> is a researcher at the <a href=\"https:\/\/www.ijs.si\/ijsw\/Center%20za%20mre\u017eno%20infrastrukturo\" target=\"_blank\" rel=\"noopener\">Centre of Network Infrastructure at the Jo\u017eef Stefan Institute<\/a> and full professor at the <a href=\"https:\/\/prevajalstvo.ff.uni-lj.si\/\" target=\"_blank\" rel=\"noopener\">Department of Translation Studies, Faculty of Arts, University of Ljubljana<\/a>. Her research interests span various areas of digital linguistics and language processing, including terminology and knowledge mining, where she was the leader of <a href=\"https:\/\/termframe.ff.uni-lj.si\/\" target=\"_blank\" rel=\"noopener\">TermFrame<\/a> which created a multilingual frame-based knowledge base, and a researcher at the <a href=\"https:\/\/nl.ijs.si\/janes\/\" target=\"_blank\" rel=\"noopener\">JANES<\/a> project exploring terminology in non-standard Slovenian; machine translation (involvement in the <a href=\"https:\/\/rsdo.slovenscina.eu\/en\" target=\"_blank\" rel=\"noopener\">Development of Slovene in the Digital Environment<\/a> project); sign language, where she was the leader of <a href=\"http:\/\/lojze.lugos.si\/signor\/en\" target=\"_blank\" rel=\"noopener\">SIGNOR<\/a>, and more recently also cognitive approaches to semantics and language modelling by heading the <a href=\"https:\/\/smallworldofwords.org\/sl\/project\" target=\"_blank\" rel=\"noopener\">SWOW-SL<\/a> word association <a href=\"http:\/\/hdl.handle.net\/11356\/1980\" target=\"_blank\" rel=\"noopener\">collection<\/a>, and the evaluation and benchmarking of LLMs within the <a href=\"https:\/\/www.cjvt.si\/llm4dh\/en\/\" target=\"_blank\" rel=\"noopener\">LLM4DH<\/a> project, where she explores nuanced language and bias. She is the founder and coordinator of the <a href=\"https:\/\/digiling.university\/\" target=\"_blank\" rel=\"noopener\">Joint Master in Digital Linguistics<\/a>, established on the basis of an awarded KA2-Erasmus+ project <a href=\"https:\/\/learn.digiling.eu\/\" target=\"_blank\" rel=\"noopener\">DigiLing: Trans-European e-learning hub for Digital Linguistics<\/a>.<\/p>\n<div id=\"themify_builder_content-8202\" data-postid=\"8202\" class=\"themify_builder_content themify_builder_content-8202 themify_builder\">\n    <\/div>\n<!-- \/themify_builder_content -->\n","protected":false},"excerpt":{"rendered":"<p>CLARIN.SI is a member of LLMs4SSH, the CLARIN K-centre for Large Language Models for Social Sciences and Humanities. The LLMs4SSH Centre offers expertise on various applications of LLMs in processing language data and on expansion and adaptation of LLMs to the needs of researchers from Social Sciences and Humanities. On this page, we provide the [&hellip;]<\/p>\n","protected":false},"author":13,"featured_media":0,"parent":6580,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-8202","page","type-page","status-publish","hentry","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"_links":{"self":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/8202","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/comments?post=8202"}],"version-history":[{"count":25,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/8202\/revisions"}],"predecessor-version":[{"id":8331,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/8202\/revisions\/8331"}],"up":[{"embeddable":true,"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/pages\/6580"}],"wp:attachment":[{"href":"https:\/\/www.clarin.si\/info\/wp-json\/wp\/v2\/media?parent=8202"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}