Common Language Resources and Technology Infrastructure, Slovenia

CLARIN.SI Partners

Alpineon, d.o.o.

Alpineon is a Slovenian RTD-performing SME specializing in developing state-of-the-art computer vision and speech-technology products. Alpineon’s RTD team has extensive experience in hardware and software development, including CTI applications, VoIP devices and services, biometric technologies (speaker and face recognition), image processing (3-D vision) and speech technologies: Slovenian text-to-speech synthesis (TTS), automatic speech recognition (ASR), speech-to-speech translation (STS), voice portal applications etc.

Alpineon’s RTD team is involved in several international and national research projects in the fields of language technologies, image processing and biometrics. It consists of 14 researchers and developers including 7 PhD holders. Alpineon is the recipient of the Slovenian Award for Technical Innovations for Disabled Persons in 2003, the award for Outstanding Research Achievements by the Slovenian Research Agency in 2013, the winner of the international ICB 2013 face recognition challenge and the ICB 2013 speaker recognition challenge and the recipient of the Slovenian Ambassador of Privacy award in 2014 for best practices in privacy protection.

Alpineon has been a member of CLARIN since 2007, and a founding member of CLARIN.SI. Alpineon  contributes to CLARIN with language resources and speech technology engines.

Representative: Jerneja Žganec Gros
Substitute representative: Boštjan Vesnicer

Amebis, d.o.o.

The Amebis Company was established in 1991 for software development and production in the fields of language technologies and electronic publishing. The primary objective is to create core technologies (modules) and products for general use. The main areas of development and products are:

  • Corpora: development, creation and management of text and speech corpora including more than 10 largest Slovenian corpora, with the 1 billion word Gigafida as the largest.
  • Language processing: various language processing modules for Slovenian and some other languages, which are available as a plug-ins for various program packages (MS Office, Lotus Notes, SAS etc.): spell-checkers, hyphenators, lemmatizers, word form generators, grammar checkers.
  • Machine translation systems: developers of Presis, a rule-based translation system, for Slovenian, which is part of the iTranslate4 translation system.
  • Speech synthesis: the Govorec speech synthesizer (developed together with JSI) and high quality speech synthesizer eBralec (with Alpineon and JSI) for Slovenian language.
  • Dialogue systems: SecondEgo is a platform for creating and dealing with virtual agents which helps websites and applications to simplify communication with the end users in different natural languages.
  • Electronic dictionaries: preparation of more than 160 electronic and 80 paper dictionaries. The best known are portals Termania and Fran.

Amebis was also a partner in several EU projects.

Amebis was one of the founding members of CLARIN.SI and contributes to CLARIN primarily through language resources and language technology applications.

Representative: Miro Romih
Substitute representative: Peter Holozan

Domestic Research Society

The Domestic Research Society (DRS) was established in 2004 by visual artists Alenka Pirman and Damijan Kracina, and art historian Jani Pirnat. Under the motto “Nothing domestic is foreign to us” the society documents, collects, researches and presents domestic phenomena as installations, exhibitions or web projects related to the field of culture, heritage, contemporary art, and science.

Launched in December 2004 by the Domestic Research Society, Razvezani jezik [The Unleashed Tongue] is the first user-generated online dictionary of spoken Slovenian language. A web project created with Wiki technology, it permits easy editing of articles in the dictionary and allows every visitor to add new entries freely. It quickly gained popularity and was labelled “the most entertaining Slovenian dictionary” by the media. The anonymous users contribute slang, neologisms, idioms, euphemisms etc. from all Slovene regions and belong to different generations and subcultures.

The wiki was developed by Luka Prinčič (programming) and Alenka Pirman (information architecture). Since 2014 it runs on the Ljudmila Art and Science Laboratory‘s server.

All the content in the dictionary is licenced under the Creative Commons Attribution-Share Alike. The derivatives (the e-book from 2006 and the paperback book from 2007 and 2014) bear the same licence.

DRS has been a member of the CLARIN.SI Consortium since September 2014 and contributes to the goals of CLARIN primarily with its on-line dictionary and outreach activities.

Representative: Alenka Pirman
Substitute Representative: Jani Pirnat

Jožef Stefan Institute

The Jožef Stefan Institute (JSI) is the leading Slovenian research organization for basic and applied research in natural sciences and technology with over 900 employees. Three organisational units are involved in developing and maintaining the CLARIN.SI infrastructure:

  • The Department of Knowledge Technologies performs research in advanced information technologies, aimed at acquiring, storing and managing knowledge to be used in the development of knowledge based applications. Established areas include intelligent data analysis, text and web mining, language technologies and computational linguistics, decision support and knowledge management. In the area of language technologies, the Department is one of the leading (and oldest) Slovenian centres for the development of language resources and annotation tools, esp. in the area of standardisation of resource encoding and linguistic formalisms and in open accessibility of resources. The Department was also among the first to develop and promote the area of Digital Humanities in Slovenia.
  • The Artificial Intelligence Laboratory is concerned with research and development in information technologies with an emphasis on artificial intelligence. The main research areas are data analysis with an emphasis on text, web and cross-modal data; scalable real-time data analysis; visualization of complex data; semantic technologies; and language technologies. The Laboratory has been involved in many EU projects in the area of text analytics and processing, where their task is mainly in providing technologies for knowledge extraction from text and for machine translation. The Laboratory also puts special emphasis on the promotion of science. In collaboration with the Centre for Knowledge Transfer in Information Technologies (CT3) they are developing the award-winning VideoLectures.NET educational portal and organizing the national ACM competition in Computer Science (in Slovene).
  • The Networking Infrastructure Centre manages the networking and hardware infrastructure at JSI. It is active on the areas of trust and authentication, e.g. it maintains the JSI IdP service and is actively involved in EduGain and other EU efforts in these areas.

The JSI is the host of the CLARIN.SI research infrastructure. It coordinates the work of the infrastricture, maintains and develops its repository and services, provides language resources and tools etc.

Representative: Tomaž Erjavec (National coordinator)
Substitute representatives: Simon Krek, Katja Zupan

Institute of Contemporary History

The Institute of Contemporary History (ICH) is the central national institution for historiographical research of the period form 19th century to today. The Institute is one of the most important institutions in the Digital Humanities in Slovenia, and is the national coordinating institution for DARIAH-SL, a member of DARIAH ERIC.

One of the three research programmes of the Institute is the Research Infrastructure of the Slovenian Historiography, which is maintaining the SIstory Web portal of Slovenian Historiography. The RI performs digitisation of material of historic importance and management of digitally born content. Another task is online publishing of basic sources for historiographical research and literature from the field, regardless of its form or format, thus not limited only to textual sources. Since textual files represent the majority of its digital archival holdings, the focus is given to advanced text mark-up, using the TEI Guidelines and relevant XML technologies. Large textual corpora are being created following this process, based mainly on the stenographical minutes of different Slovene legislative bodies

ICH was one of the founding members of CLARIN.SI and contributes to CLARIN primarily through making available its large and richly encoded text collections of recent Slovenian historical sources and as being the primary liason to DARIAH(-SI).

Representative: Andrej Pančur
Substitute representatives: Mojca Šorn, Jurij Hadalin

Slovenian Language Technologies Society

The Slovenian Language Technologies Society (SDJT) was founded in 1998 and joins people working on language technologies from the scientific, educational or user perspectives. The activities of the SDJT are aimed at promoting the development of language technologies for the Slovenian language. In 2011 the society was awarded special status of a research institution working in the public interest. The society has 120 members.

The main activities of SDJT are its monthly JOTA lecture series on NLP-related topics and the biennial conference on Slovene language technologies. The society also organizes educational events, such as the ESSLLI Summer School on Language, Logic and Information, the TransTech summer school on translation technologies and seminars on corpora and on-line language resources for Slovene for secondary and primary school teachers.

SDJT is a founding member of the CLARIN.SI consortium, and contributes to the goals of CLARIN mainly by outreach activities, such as seminars, tutorials and conference organisation.

Representative: Darja Fišer
Substitute representative: Katja Zupan

Trojina, Institute for Applied Slovene Studies

Trojina, Institute for Applied Slovene Studies, is a private non-profit research institute founded in 2004 with offices in Ljubljana. Its staff are experts in the fields of Slovene studies, lexicography, corpus linguistics, language technologies, language didactics, and translation. Trojina is the leading Slovenian institution for applied research in linguistics and in the development of didactic language technologies. The Institute has, in cooperation with other Slovenian institutions, developed several corpora and corpus-based tools that are used by researchers, language teachers and learners, and has conducted various nationally important corpus-based studies.

The Institute has recently led two related and highly innovative applied projects in the Slovenian language research, namely the compilation of the Šolar corpus and the creation of the Pedagogical Grammar Portal, which focused largely on the development of the methodology for creating corpus based teaching and learning materials based on the analysis of errors in learner written production. In addition to developing resources and tools, Trojina regularly participates in projects that promote the use of language technologies to various types of users, e.g. researchers and teachers. For example, the JTIU project (2012-2014), led by the SDJT (Slovenian Language Technologies Society) involved delivering workshops on language resources and technologies for Slovene for elementary and secondary school teachers across Slovenia. In 2014-2015, we have developed the Portal of language resources, where lay users can learn about language resources and tools for Slovene, also by viewing short instructional videos which come with Slovenian and English subtitles. In 2017, the portal was showcased on CLARIN.eu.

The Centre for Applied Linguistics is a recently established national Infrastructure Centre at Trojina and is dedicated to conducting applied research and offering language-technology support to researchers and research programmes conducting research in the field of applied linguistics, especially research into literacy, and literacy-related studies in humanities and sociology.

Trojina is a founding member of the CLARIN.SI consortium, and contributes to the goals of CLARIN by outreach activities, as well as language resources and tools.

Representative: Iztok Kosem
Substitute representative: Kaja Dobrovoljc

University of Ljubljana

The University of Ljubljana is the largest Slovenian University, where corpus linguistics and language technologies are coordinated by its Centre for Language Resources and Technologies (CJVT UL). The Centre is a research unit of the University dedicated to scientific research of language, creation and maintenance of practically useful digital language resources and technologies for modern Slovene language, available on the web to all Slovene language users. Research areas include description of modern Slovene language and computer-aided learning and teaching of Slovene and foreign languages. Practical tasks include continuous and user-friendly access to corpora, lexical, terminological and other databases, creation and maintenance of web-based language learning and teaching environments, as well as distribution of publicly financed and open source language resources and tools.

The Centre is organized within the Slovene research agency (ARRS) financed Network of research infrastructure centres at University of Ljubljana. The collaborating partners in the centre are the Faculty of Social Sciences, Faculty of Arts, Faculty of Education, Faculty of Electrical Engineering, and Faculty of Computer and Information Science.

An important aim of the centre is to provide publicly available information about Slovene language resources and language technologies in Slovenia in order to enhance their public perception, and dissemination of language resources and tools. These aims are strongly related to CLARIN’s mission.

Respresentative: Monika Kalin Golob
Substitute representatives: Marko Robnik Šikonja, Nataša Logar, Karmen Pižorn, Špela Vintar, France Mihelič

University of Maribor

The University of Maribor is the second largest and the second oldest university in Slovenia with about 18 000 students. It has seventeen faculties with undergraduate and postgraduate programmes. The University of Maribor is a regional developer and its faculties are located not only in the city of Maribor, but also in other parts of Slovenia. Two of its Faculties are the most active developers and users of language technologies and resources.

The Faculty of Electrical Engineering and Computer Science is devoted to performing education and research within the fields of electrical engineering, computer science, information technology, communications, media, telecommunications and mechatronics. It has contributed to various projects with the development of language resources or language processing tools for Slovene.

The Faculty of Arts is the youngest member of University of Maribor, established in 2006, but with study programs and departments developed in the framework of the Faculty of Pedagogy a long time ago. It covers fields of Slavic studies, English and American studies, German studies, Hungarian studies, history, geography, philosophy, sociology, psychology and pedagogy.

University of Maribor has been a leader or a member of various language technologies projects, thus contributing to the goals of the CLARIN by developing language resources, such as databases for speech recognition or language technologies tools and making them available via the CLARIN.SI repository. Also, the  members of the Faculty of Arts are potential users of the CLARIN services.

Representative: Darinka Verdonik
Substitute representatives: Milan Ojsteršek, Andrej Žgank

University of Nova Gorica

The University of Nova Gorica is a young (est. 1995, university accreditation in 2005) and growing private research-oriented university in Slovenia comprising seven schools and twelve research centers. The University is a member of the European University Association (including EUA- Council for Doctoral Education). Despite its still relatively small size (approx. 100 PhD holders), the University of Nova Gorica has hosted numerous nationally- and European-funded research projects including a €4,000,000 FP7-REGPOT grant awarded in 2011, collaborates with over 40 European and international universities and research centers, participates in the academic and research exchange programs (ERASMUS, COST) and in an EC Erasmus Mundus joint study program. It is also member of CLARIN.SI since 2015.

Activities related to CLARIN.SI are conducted in UNG’s Center for Cognitive Science of Language. The unit currently employs 6 PhD holders (four senior and two postdoctoral researchers) who specialize in formal theoretical and experimental linguistics, but are also involved in various applied linguistics activities, from conducting language-planning studies for the Slovenian Ministry of Culture to maintaining an online language consultancy. Through its Center for Cognitive Science of Language, UNG contributes to CLARIN with its language resources.

Representative: Rok Žaucer
Substitute representative: Franc Marušič

University of Primorska

The University of Primorska covers the Slovenian Littoral region with Faculties and Institutes in Koper, Izola, and Portorož. It has around 5,000 students. Two of its Faculties and one Institute are the most active developers and users of language technologies and resources.

The Faculty for Mathematics, Natural Sciences and Information Technologies offers undergraduate and postgraduate study programmes in mathematics, computer science, natural sciences and biotechnical sciences. Language technologies are mostly used and researched at the Department of Information Sciences and Technologies. Most of the research is done in the fields of Machine Translation and Knowledge discovery.

The Faculty of Humanities offers both undergraduate and postgraduate degree courses as well as engaging in scientific and specialist activities in the field of humanities, arts and social studies. Language technologies are mostly used and researched with the collaboration of the Institute for Linguistic Research of the Science Research Centre. Most of the research is done in the field of corpus based language studies and corpus construction.

University of Primorska has been a leader or a member of various language technologies projects. The contribution to CLARIN are the provision of domain-specific language corpora and dictionaries and being users of the CLARIN resources and services.

Respresentative: Jernej Vičič

Scientific and Research Centre of the Slovenian Academy of Sciences and Arts

The Research Centre of the Slovenian Academy of Sciences and Arts (ZRC SAZU) is the leading Slovenian research centre in the humanities and a cutting-edge academic institution in Central, Eastern, and Southeastern Europe. It has a multidisciplinary character; in addition to the humanities, its spheres of research also cover the natural and social sciences. It conducts research on a broad variety of topics connected with natural and cultural heritage of Slovenia. In its present form, the ZRC SAZU is a network of eighteen institutes with over 300 researchers and technical specialists.

The largest among the institutes is the Fran Ramovš Institute of the Slovenian Language. It is the national centre for systematic monitoring and description of the Slovenian language. The Institute was established in 1945 for the purpose of compiling linguistic materials and using them for the creation of comprehensive and authoritative Slovenian language resources, primarily dictionaries: orthographic dictionaries, dictionaries of contemporary standard Slovenian, terminological dictionaries, etymological dictionaries, historical dictionaries, dialectal dictionaries, linguistic atlases as well as descriptive and historical studies in linguistics. All monolingual comprehensive dictionaries of Slovenian and many applied ones have been compiled at the Institute of the Slovenian Language. Since 2000, the Institute has published 38 dictionaries on 18,402 pages, 73 monographs on 21,102 pages, and 36 issues of journals on 7785 pages. The majority of these works are also available online with free access.

ZRC SAZU was one of the founding members of CLARIN.SI and contributes to the activities of CLARIN through language resources and consultancy on the Slovenian language.

Representative: Mateja Jemec Tomazin
Substitute representatives: Helena Dobrovoljc, Jerneja Fridl, Nina Ledinek

CLARIN.SI CLARIN CENTRE B Data Seal of Approval Open Archives re3data_logo OpenAIRE