On 13th June 2019, the Centre for language resources and technologies, University of Ljubljana, presented Gigafida 2.0 – a new version of the Gigafida corpus. The updated corpus is a reference corpus of written standard Slovene with 1.2 billion words from daily newspapers, magazines, selected web pages and books. The corpus can be accessed through the CJVT Resources portal, a dedicated web page or the CLARIN.SI concordancers.
Category, tied to the Themify widget, used to present post on the Sidebar of the web page.
Tanja Wissik (Centre of Digital Humanities of the Austrian Academy of Sciences) received a CLARIN Mobility Grant to visit the Jožef Stefan Institute in April 2019. Tanja has initiated the corpus project ParlAT, a corpus of Austrian parliamentary records but wanted to convert ParlAT into a standard format, such as TEI, to allow interoperability with other data sets and, in doing so, enable efficient research across different parliamentary data and other corpora. During her research visit April 14 – 19, 2019 in Ljubljana, Tomaž Erjavec and Andrej Pančur helped Tanja analyse the corpus and proposed ways in which to convert it to TEI. You can read more about the visit in her CLARIN blog.
We are happy to announce that as of 1 October 2018 Darja Fišer (University of Ljubljana and Jožef Stefan Institute) has been reappointed as CLARIN ERIC Director of User Involvement for another two years.
In her first term, which lasted from 1 October 2016 to 30 September 2018, Darja Fišer has worked hard to bring CLARIN users and our infrastructure usability and usefulness to centre stage. With a strong support from the National Coordinator’s Forum, she has succeeded in appointing National User Involvement Coordinators in all CLARIN member countries who act as promotors of User Involvement activities at the national level but also serve as a vital link to share information and experience related to outreach and uptake of the infrastructure at the international level as well.
Three flagship initiatives have been introduced in Darja’s first period as User Involvement Director:
- CLARIN Resource Families: The goal of the initiative is to provide a systematic and comprehensive overview of the state of the infrastructure by focusing on the types of resources that are particularly relevant for a wide range of researchers from digital humanities, social sciences and human language technologies, such as parliamentary corpora, social media corpora, newspaper corpora, etc. Such systematic and user-friendly overviews have proven highly valuable for internal use but also directly useful for the users of the infrastructure. They have also shown CLARIN’s enrichment potential, lead to tangible improvements of metadata and functionalities of the Virtual Language Observatory, inspired development of new resources as well as fostered community building of researchers congregating around specific data types.
- Tour de CLARIN: This initiative aims to periodically highlight prominent User Involvement activities of a particular CLARIN national consortium. The highlights include a presentation of the national consortium and their flagship tools, resources and User Involvement events, as well as an interview with a prominent researchers who have used the consortium’s infrastructure in their research and can share their experience with CLARIN. Tour de CLARIN has helped to increase the visibility of the national consortia, revealed the richness of the CLARIN landscape, and displayed the full range of activities throughout the network.
- Call for (co-)funding User Involvement Events: Each year, we make a budget available for (co-)financing User Involvement events, such as summer schools, tutorials, seminars and master classes, which are organized by representatives of national consortia. 20 such events have been organized so far. They offered a wide range of topics and target diverse research communities which undoubtedly boost CLARIN’s outreach and uptake efforts well beyond the capacities of CLARIN ERIC alone and ensure long-term sustainability of the outreach model. Whenever possible, talks and lectures have been recorded and published on the Videlolectures.NET portal which now offers over 100 videos from CLARIN’s 12 events.
We are pleased to announce that the long-running JOTA talks – a series of lectures on NLP-related topics by Slovene and foreign researchers organized by the Slovenian Language Technologies Society are now also available on videolectures.net: videolectures.net/jota.
Learn more about JOTA at The Slovenian Language Technologies Society: www.sdjt.si/
Nikola Ljubešić (Jožef Stefan Institute, Ljubljana) and Yves Scherrer (University of Geneva) have been ranked first among 16 different systems in the CLIN2017 shared task on normalising historical text with their system (https://github.com/clarinsi/