CLARIN Café – the CrowLL project
We would like to cordially invite you to the latest CLARIN Café, which will be held on , – the topic of the CrowLL project – Creating pedagogical corpora with annotation of sensitive content and offensive language.
More information about the event is available below and on the following link.
CLARIN Café – Creating pedagogical corpora with annotation of sensitive content and offensive language – the CrowLL project
Date: 4 April 2024
Time: 14:00 – 16:00 (CEST)
Venue: CLARIN virtual Zoom meeting
ABOUT: The main goal of the CrowLL project was to create manually annotated pedagogical corpora that can be used by lexicographers, language teachers, and NLP researchers. The languages were Brazilian Portuguese, Dutch, Estonian, and Slovene. Corpus sentences are annotated as “problematic” or “non-problematic” from the point of usage for pedagogical purposes. Sentences labelled as problematic also have annotations defining the category of the problem (offensive, vulgar, sensitive content, grammar/spelling problems, incomprehensible/lack of context). For each language, the corpus consists of 10,000 sentences annotated by language experts. These corpora, together with annotation guidelines in each language and in English, are available on PORTULAN CLARIN. In this CLARIN Café, we will share the steps that were followed to create these manually annotated corpora and will discuss some of the challenges that were faced. We will also demo the game to foster further expansion of this type of data collection to other languages. Finally, we will reflect on future steps of this project.
You can register for free using this link in order to receive the meeting room details.