Slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije
Common Language Resources and Technology Infrastructure, Slovenia


CLASSLA K Centre workshops

The following workshops, organised by CLASSLA, are presented below:

April to September 2024: CLASSLA-Express – Workshops on using CLARIN.SI corpora in language research

From April to September 2024, a series of six workshops CLASSLA-Express will take place in 5 countries: Croatia (Zagreb and Rijeka), Serbia (Belgrade), North Macedonia (Skopje), Bulgaria (Sofia) and Slovenia (Ljubljana). The workshops aim to show participants how to use the CLASSLA web corpora in language research. They comprise hands-on exercises showing how to create queries in corpora for Bulgarian, Croatian, Macedonian, Serbian and Slovene. More details on the CLASSLA-Express workshops are available here.

Report from the first two stops of CLASSLA-Express: Zagreb and Rijeka

The CLASSLA-Express is in full swing, with two workshops already underway. The first was held on April 19 at the Faculty of Humanities and Social Sciences in Zagreb, with 22 participants. Following that, the second workshop was held at the Faculty of Humanities and Social Sciences in Rijeka, with 16 participants in attendance.

Each workshop was divided into two parts: the first one focused on theory and the second on practical applications. Both workshop convenors gave introductory lectures during the theoretical part. Ivana Filipović Petrović introduced the Slovenian national consortium CLARIN.SI and presented the CLASSLA corpora developed at the CLASSLA Knowledge Centre. Jelena Parizoska provided an overview of the application of computer corpora in linguistic research.

The practical part of the workshops was dedicated to introducing participants to basic and advanced searches in the NoSketch Engine tool, which hosts freely accessible CLASSLA-web corpora. Participants actively engaged in the exercises, posing questions related to their specific linguistic interests. They were particularly interested in formulating CQL queries with the convenors on-site and doing exercises where they had to find results themselves. Many highly up-to-date questions, such as how to use the power of large language models in linguistic research, were raised during the discussion. An overall conclusion was that we still need corpora for trustworthy examples and distributions of human usage of language, but that large language models can be used for enrichment or filtering of the corpus evidence.

Continuing onward, the CLASSLA-Express team is very excited about the upcoming stops.

November 2021: Workshop on regional markedness in text

On 6 and 7 November 2021, an online workshop dedicated to regional markedness in text took place, organised by the ReLDI centre, University of Zurich, and CLASSLA.

The program of the two-day event included the keynote talk on Computational dialectology by Yves Scherrer from the University of Helsinki, Darja Fišer’s presentation of the student research at the JTDH Language Technologies and Digital Humanities Conference, and two interactive workshops: Interactive workshop on regional variation in text, led by Sara Košutar, Larissa Schmidt, and Leyla Feiner, and Regional variation in gender marking: a hands-on tutorial on extracting data from corpora, led by Mirjana Starović and Tanja Samardžić.

The materials for the workshop on regional variation in gender marking are available here. They provide a gentle introduction to the process of analysing corpora, containing information on:

  • which South Slavic corpora are available on the CLARIN.SI repository, and how to find comparable corpora
  • how to explore corpora through the noSketchEngine and KonText concordancers
  • how to query the corpora using the CQL (Corpus Query Language) syntax
  • how to analyse gender marking in each South Slavic corpus by analysing the number of occurrences of feminine and masculine nouns describing occupations (e. g. the feminine and masculine nouns for the word “director”)
  • how to use the morphosyntactic descriptions (MSDs) to analyse the distribution of verbs with feminine and masculine suffixes (e.g. “mislila” vs. “mislil” for “she/he thought”)
  • and finally, how the results can be interpreted to analyse gender bias in society.

Some excerpts of the document are presented below:

May 2020: First CLASSLA K Centre workshop

The first CLASSLA K Centre workshop was supposed to be held from May 6 to May 8 2020 in Ljubljana, but due to the COVID-19 crisis, the face2face workshop had to be postponed. However, in the process of selecting participants a very nice crowd came together, so we decided to host an online Zoom session on May 6, the day we should have all met in Ljubljana.

The session took two hours, and in the first hour all the participants briefly presented themselves. In the second hour of the session, a short discussion on the future steps for the workshop and the knowledge centre were discussed, kick-started with the results on the survey taken by the participants before the online session. Also, the ReLDI centre for linguistic data was presented as well as the current CLARIN ERIC funding opportunities.

This discussion revealed the following priorities: (1) connecting web services with concordancers is a very sought feature, so that researchers could easily process and publish their textual raw data, (2) the Knowledge centre might need a form for reporting use cases on its resources (a draft of such a form has been made available here), and (3) the participants are very interested in holding group discussions on specific topics, which will be organised in the weeks to come.

The whole online session seemed to be a very pleasant experience for the 42 participants and we have the Zoom photos of the participants below to prove that!

We are still looking forward to the face2face workshop which we hope will take place during the next year.