MULTEXT-East Lexica
                             Version 4
                       http://nl.ijs.si/ME/V4/

This directory contains the following files:

00README.txt     This file

Word-form lexica in MULTEXT format, with conditions on availability:

wfl-bg.txt       Bulgarian            free
wfl-cs.txt       Czech                free
wfl-en.txt       English              free
wfl-et.txt       Estonian             free
wfl-fr.txt       French               free
wfl-hu.txt       Hungarian            free
wfl-ro.txt       Romanian             free
wfl-sk.txt       Slovak               free
wfl-sl-rozaj.txt Resian (sl dialect)  free
wfl-sl.txt       Slovene              free
wfl-uk.txt       Ukrainian            free  

Separate submission:
wfl-fa.txt       Farsi/Persian        license for research use only
wfl-mk.txt       Macedonian           license for research use only
wfl-pl.txt       Polish               license for research use only  
wfl-ru.txt       Russian              license for research use only
wfl-sr.txt       Serbian              license for research use only

The word-form lexica are in MULTEXT format, where each entry is in a
separate line and contains (at least) three fields; the first filed of
the entry is the word-form, the second the lemma, and the third the
morphosyntactic description, MSD. Some lexica also make use of further
columns, e.g. Persian gives the transliteration to ASCII of the
word-form and lemma.

The files are encoded in UTF-8 with TAB (^I) as record separator and
Unix-type end-of-lines (^J). Sort order is UTF-8. When the word-form 
or lemma contains spaces, these are substituted by underscore.

The MSD are defined in the MULTEXT-East morphosyntactic
specifications, http://nl.ijs.si/ME/V4/msd/
Note that the lexica use the definitions for the particular language, 
not the common ones.

Responsibility:

Bulgarian:
L. Dimitrova, L. Sinapova, K. Simov, D. Popov, Sv. Manova-Vidinska
Department of Mathematical Linguistics
Institute of Mathematics and Informatics
Bulgarian Academy of Sciences

Czech: 
V.Petkevic, J.Klimova and V.Schmiedtova
Institute of Theoretical and Computational Linguistics
Faculty of Philosophy
Charles University

Estonian: 
H.J.Kaalep, E.Toomsalu
Department of General Linguistics
Tartu University

English: 
N. Ide, G. Priest-Dorman
Dept. of Computer Science
Vassar College

Farsi:
B. QasemiZadeh and S. Rahimi
Digital Enterprise Research Institute
Galway, Ireland

Hungarian: 
C.Oravecz and L.Tihanyi
Research Institute for Linguistics
Hungarian Academy of Sciences

Macedonian:
Aleksandar Petrovski

Polish:
N. Kotsyba(1), I. Derzhanski(2), and A. Radziszewski(3)
(1) Institute of Interdisciplinary Studies, Warsaw University
(2) Institute of Mathematics and Informatics, Bulgarian Academy of Sciences
(3) Institute of Informatics, Wroclaw University of Technology

Resian: 
Han Steenwijk
Dipartimento di Lingue e Letterature Anglo-Germaniche e Slave
Padova University

Romanian: 
S.Bruda, C.Diaconu, L.Diaconu, and D.Tufis
Center for Research in Machine Learning, 
Natural Language Processing and Conceptual Modelling
Romanian Academy of Sciences

Serbian:
C. Krstev
Computer Science Departement 
Faculty of Mathematics
University of Belgrade

Slovak: 
R. Garabik
L. Stur Institute of Linguistics
Slovak Academy of Sciences

Slovene: 
T. Erjavec
Dept. of Knowledge Technologies, Jozef Stefan Institute

Ukrainian:
N. Kotsyba, I. Shevchenko(2), I. Derzhanski(3), and A. Mykulyak(1)
(1) Institute of Interdisciplinary Studies, Warsaw University
(2) Ukrainian Lingua-Information Fund, National Academy of Sciences of Ukraine
(3) Institute of Mathematics and Informatics, Bulgarian Academy of Sciences


The MULTEXT-East partners would like to acknowledge the contributors
of the following lexica which served as the basis of the MULTEXT-East
ones:

Czech lexicon:      dr. Jan Hajic and BYLL Software

Hungarian lexicon:  MorphoLogic

Slovene lexicon:    Amebis d.o.o.

Polish lexicon:     Marcin Woliski: Morfeusz morphological analyzer
                    (http://nlp.ipipan.waw.pl/~wolinski/morfeusz/), c.f.
                    Marcin Woliski. Morfeusz, a Practical Tool for the 
                    Morphological Analysis of Polish. 
                    In: Intelligent Information Processing and Web Mining, 
                    IIS:IIPWM'06 Proceedings, pp. 503-512, Springer, 2006.

================================================================================
Tomaz Erjavec, JSI
2010-05-09