CLARIN workshop
Multilingual corpus annotation tools:
development and integration
Ljubljana, November 10 – 11, 2016
Introduction
Basic annotation of language corpora is a prerequisite for corpus linguistics or any advanced explorations of information content of language. Yet, for many CLARIN languages, online annotation tools are not available. This two-day workshop aimed to close this gap by joining CLARIN members that have locally developed annotation tools or resources in order to integrate them in terms of specifications and offer them as web services in the scope of the WebLicht architecture. The planned multilingual web services to be developed will enhance the utility of workflow construction and execution workflows and feed back into their development and documentation.
The workshop catalogued available tools, resources and encoding standards of the participants and proposed a workplan on how to integrate them with WebLicht, also considering other such environments, such as TextFlows, developed at JSI. The concrete result of the workshop is an implementation plan with its timeline.
Agenda
First Day
Thursday, November 10th, Physics seminar room:
9:00 – 9:30 | Introduction | T. Erjavec, D. Fišer |
9:30 – 10:30 | WebLicht | M. Hinrichs, W. Qiu |
10:30 – 10:45 | Coffee break | |
10:45 – 11:15 | TextFlows | S. Pollak, M. Martinc, M. Perovšek |
11:15 – 12:15 | ReLDI data & tools | N. Ljubešić |
12:15 – 12:45 | Estonian data & tools | K. Liin |
12:45 – 13:45 | Lunch | |
13:45 – 14:15 | Latvian data & tools | I. Skadiņa, R. Darģis, L. Pretkalniņa |
14:15 – 14:45 | Discussion | all |
14:45 – 15:00 | Coffee break | |
15:00 – 16:30 | Discussion | all |
19:00 – | Dinner at “Špajza“ |
Second Day
Friday, November 11th, Biochemistry seminar room:
9:00 – 9:30 | Italian data & tools | R. Del Gratta |
9:30 – 10:00 | Czech data & tools | P. Stranak |
10:00 – 11:00 |
WebLicht Hackaton
|
all |
11:00 – 11:15 | Coffee break | |
11:15 – 12:45 |
WebLicht Hackaton +
Drafting the workplan
|
all |
12:45 – 13:45 | Lunch | |
13:45 – 14:45 | Drafting the workplan | all |
14:45 – 15:00 | Coffee break | |
15:00 – 16:30 |
Workplan discussion
|
all |
Envisaged implementation project
Note that the plan is still under development!
- Basic annotation services in WebLicht:
- Tools for tokenisation,sentence segmentation, morphosyntactic tagging and lemmatisation exposed as Web services and intergrated with WebLicht (internet protocol, TCF I/O)
- Languages covered: sl, hr, sr, lv, et, cs, it
- Basic WebLicht documentation and a short video tutorial will be prepared in national languages
- Normalisation of words will be added to WebLicht, in the first instance covering sl CMC
- Evaluation
- The functioning of the tools will be tested with Bombard and Awesome profilers
- A user centred evaluation will be prepared and carried out