Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Bruda, Ştefan
dc.contributor.author Derzhanski, Ivan
dc.contributor.author Dimitrova, Ludmila
dc.contributor.author Garabík, Radovan
dc.contributor.author Holozan, Peter
dc.contributor.author Ide, Nancy
dc.contributor.author Kaalep, Heiki-Jaan
dc.contributor.author Kotsyba, Natalia
dc.contributor.author Oravecz, Csaba
dc.contributor.author Petkevič, Vladimír
dc.contributor.author Priest-Dorman, Greg
dc.contributor.author Shevchenko, Igor
dc.contributor.author Simov, Kiril
dc.contributor.author Sinapova, Lydia
dc.contributor.author Steenwijk, Han
dc.contributor.author Tihanyi, Laszlo
dc.contributor.author Tufiş, Dan
dc.contributor.author Véronis, Jean
dc.date.accessioned 2015-06-15T08:46:04Z
dc.date.available 2015-06-15T08:46:04Z
dc.date.issued 2010-05-14
dc.identifier.uri http://hdl.handle.net/11356/1041
dc.description The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of the word; (2) the lemma, the base-form of the word; (3) the MSD, the morphosyntactic description of the word-form, i.e., its fine-grained PoS tag, as defined in the MULTEXT-East morphosyntactic specifications. This submission contains the freely available MULTEXT-East lexicons, while a separate submission (http://hdl.handle.net/11356/1042) gives those that are available only for non-commercial use.
dc.language.iso bul
dc.language.iso ces
dc.language.iso eng
dc.language.iso est
dc.language.iso fra
dc.language.iso hun
dc.language.iso ron
dc.language.iso slk
dc.language.iso slv
dc.language.iso ukr
dc.publisher Jožef Stefan Institute
dc.relation info:eu-repo/grantAgreement/EC/FP7/211938
dc.relation.isreferencedby https://doi.org/10.1007/s10579-011-9174-8
dc.relation.replaces http://hdl.handle.net/11372/LRT-675
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/ME/Vault/V4/
dc.subject lemmatisation
dc.subject inflection
dc.subject part-of-speech tagging
dc.subject multilingual
dc.title MULTEXT-East free lexicons 4.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri http://nl.ijs.si/ME/Vault/V4/doc/index.html#sec-lex
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor EU Copernicus COP-106 MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages Other
sponsor EU Copernicus TELRI Trans-European Language Resources Infrastructure Other
sponsor EU Copernicus CONCEDE Consortium for Central European Dictionary Encoding Other
sponsor FP7 Capacities MONDILEX Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and Their Digital Resources euFunds info:eu-repo/grantAgreement/EC/FP7/211938
size.info 3665864 entries
files.count 12
files.size 17058784


 Files in this item

 Download all files in item (16.27 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
wfl-bg.txt.gz
Size
343.37 KB
Format
application/gzip
Description
Bulgarian lexicon, 55199 entries
MD5
fdcae6385aa347e08498f46411d9eeef
 Download file
Icon
Name
wfl-cs.txt.gz
Size
755.86 KB
Format
application/gzip
Description
Czech lexicon, 184624 entries
MD5
6fa90e6d75a470c35289e9b7287b4248
 Download file
Icon
Name
wfl-en.txt.gz
Size
339.68 KB
Format
application/gzip
Description
English lexicon, 71784 entries
MD5
83e57f297ff2568bcbd8bf3b104a4d82
 Download file
Icon
Name
wfl-et.txt.gz
Size
727.49 KB
Format
application/gzip
Description
Estonian lexicon, 135095 entries
MD5
2b264f19451af996b80efb95ea065b2e
 Download file
Icon
Name
wfl-fr.txt.gz
Size
1.35 MB
Format
application/gzip
Description
French lexicon, 306792 entries
MD5
09324e2f0b7524e2f5bbd8bb9bb86ba7
 Download file
Icon
Name
wfl-hu.txt.gz
Size
402.95 KB
Format
application/gzip
Description
Hungarian lexicon, 64035 entries
MD5
e874f53e58feb23fb4646dbeedc2768f
 Download file
Icon
Name
wfl-ro.txt.gz
Size
2.13 MB
Format
application/gzip
Description
Romanian lexicon, 428194 entries
MD5
7f439a48ec8e7e597bb5b0ba99c23a3b
 Download file
Icon
Name
wfl-sk.txt.gz
Size
7.85 MB
Format
application/gzip
Description
Slovak lexicon, 1910872 entries
MD5
32c7e64d8a840f7e6684b12313a88b22
 Download file
Icon
Name
wfl-sl-rozaj.txt.gz
Size
7.07 KB
Format
application/gzip
Description
Resian lexicon, 965 entries
MD5
7fc1576b33fa9082a3fbd33b5534383b
 Download file
Icon
Name
wfl-sl.txt.gz
Size
938.84 KB
Format
application/gzip
Description
Slovene lexicon, 208012 entries
MD5
c295008fa847f06ae32b6bd15ea24f4c
 Download file
Icon
Name
wfl-uk.txt.gz
Size
1.5 MB
Format
application/gzip
Description
Ukrainian lexicon, 300292 entries
MD5
6454387c48ad78fc50f408eb57ebe511
 Download file
Icon
Name
00README.txt
Size
4.41 KB
Format
Text file
Description
Unknown
MD5
3fd6a9b7c42e7422d75c0ca9fc2c75da
 Download file  Preview
 File Preview  
MULTEXT-East Lexica
                             Version 4
                       http://nl.ijs.si/ME/V4/

This directory contains the following files:

00README.txt     This file

Word-form lexica in MULTEXT format, with conditions on availability:

wfl-bg.txt       Bulgarian            free
wfl-cs.txt       Czech                free
wfl-en.txt       English              free
wfl-et.txt       Estonian             free
wfl-fr.txt       French               free
wfl-hu.txt       Hungarian            free
wfl-ro.txt       Romanian             free
wfl-sk.txt       Slovak               free
wfl-sl-rozaj.txt Resian (sl dialect)  free
wfl-sl.txt       Slovene              free
wfl-uk.txt       Ukrainian            free  

Separate submission:
wfl-fa.txt       Farsi/Persian        license for research use only
wfl-mk.txt       Macedonian           license for research use only
wfl-pl.txt       Polish               license for research use only  
wfl-ru.txt       Russ . . .
                                            

Show simple item record