dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Bruda, Ştefan |
dc.contributor.author | Derzhanski, Ivan |
dc.contributor.author | Dimitrova, Ludmila |
dc.contributor.author | Garabík, Radovan |
dc.contributor.author | Holozan, Peter |
dc.contributor.author | Ide, Nancy |
dc.contributor.author | Kaalep, Heiki-Jaan |
dc.contributor.author | Kotsyba, Natalia |
dc.contributor.author | Oravecz, Csaba |
dc.contributor.author | Petkevič, Vladimír |
dc.contributor.author | Priest-Dorman, Greg |
dc.contributor.author | Shevchenko, Igor |
dc.contributor.author | Simov, Kiril |
dc.contributor.author | Sinapova, Lydia |
dc.contributor.author | Steenwijk, Han |
dc.contributor.author | Tihanyi, Laszlo |
dc.contributor.author | Tufiş, Dan |
dc.contributor.author | Véronis, Jean |
dc.date.accessioned | 2015-06-15T08:46:04Z |
dc.date.available | 2015-06-15T08:46:04Z |
dc.date.issued | 2010-05-14 |
dc.identifier.uri | http://hdl.handle.net/11356/1041 |
dc.description | The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of the word; (2) the lemma, the base-form of the word; (3) the MSD, the morphosyntactic description of the word-form, i.e., its fine-grained PoS tag, as defined in the MULTEXT-East morphosyntactic specifications. This submission contains the freely available MULTEXT-East lexicons, while a separate submission (http://hdl.handle.net/11356/1042) gives those that are available only for non-commercial use. |
dc.language.iso | bul |
dc.language.iso | ces |
dc.language.iso | eng |
dc.language.iso | est |
dc.language.iso | fra |
dc.language.iso | hun |
dc.language.iso | ron |
dc.language.iso | slk |
dc.language.iso | slv |
dc.language.iso | ukr |
dc.publisher | Jožef Stefan Institute |
dc.relation | info:eu-repo/grantAgreement/EC/FP7/211938![]() |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-011-9174-8 |
dc.relation.replaces | http://hdl.handle.net/11372/LRT-675 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://nl.ijs.si/ME/Vault/V4/ |
dc.subject | lemmatisation |
dc.subject | inflection |
dc.subject | part-of-speech tagging |
dc.subject | multilingual |
dc.title | MULTEXT-East free lexicons 4.0 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | lexicon |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://nl.ijs.si/ME/Vault/V4/doc/index.html#sec-lex |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | EU Copernicus COP-106 MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages Other |
sponsor | EU Copernicus TELRI Trans-European Language Resources Infrastructure Other |
sponsor | EU Copernicus CONCEDE Consortium for Central European Dictionary Encoding Other |
sponsor | FP7 Capacities MONDILEX Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and Their Digital Resources euFunds info:eu-repo/grantAgreement/EC/FP7/211938 |
size.info | 3665864 entries |
files.count | 12 |
files.size | 17058784 |
Files in this item
Download all files in item (16.27 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- wfl-bg.txt.gz
- Size
- 343.37 KB
- Format
- application/gzip
- Description
- Bulgarian lexicon, 55199 entries
- MD5
- fdcae6385aa347e08498f46411d9eeef

- Name
- wfl-cs.txt.gz
- Size
- 755.86 KB
- Format
- application/gzip
- Description
- Czech lexicon, 184624 entries
- MD5
- 6fa90e6d75a470c35289e9b7287b4248

- Name
- wfl-en.txt.gz
- Size
- 339.68 KB
- Format
- application/gzip
- Description
- English lexicon, 71784 entries
- MD5
- 83e57f297ff2568bcbd8bf3b104a4d82

- Name
- wfl-et.txt.gz
- Size
- 727.49 KB
- Format
- application/gzip
- Description
- Estonian lexicon, 135095 entries
- MD5
- 2b264f19451af996b80efb95ea065b2e

- Name
- wfl-fr.txt.gz
- Size
- 1.35 MB
- Format
- application/gzip
- Description
- French lexicon, 306792 entries
- MD5
- 09324e2f0b7524e2f5bbd8bb9bb86ba7

- Name
- wfl-hu.txt.gz
- Size
- 402.95 KB
- Format
- application/gzip
- Description
- Hungarian lexicon, 64035 entries
- MD5
- e874f53e58feb23fb4646dbeedc2768f

- Name
- wfl-ro.txt.gz
- Size
- 2.13 MB
- Format
- application/gzip
- Description
- Romanian lexicon, 428194 entries
- MD5
- 7f439a48ec8e7e597bb5b0ba99c23a3b

- Name
- wfl-sk.txt.gz
- Size
- 7.85 MB
- Format
- application/gzip
- Description
- Slovak lexicon, 1910872 entries
- MD5
- 32c7e64d8a840f7e6684b12313a88b22

- Name
- wfl-sl-rozaj.txt.gz
- Size
- 7.07 KB
- Format
- application/gzip
- Description
- Resian lexicon, 965 entries
- MD5
- 7fc1576b33fa9082a3fbd33b5534383b

- Name
- wfl-sl.txt.gz
- Size
- 938.84 KB
- Format
- application/gzip
- Description
- Slovene lexicon, 208012 entries
- MD5
- c295008fa847f06ae32b6bd15ea24f4c

- Name
- wfl-uk.txt.gz
- Size
- 1.5 MB
- Format
- application/gzip
- Description
- Ukrainian lexicon, 300292 entries
- MD5
- 6454387c48ad78fc50f408eb57ebe511

- Name
- 00README.txt
- Size
- 4.41 KB
- Format
- Text file
- Description
- Unknown
- MD5
- 3fd6a9b7c42e7422d75c0ca9fc2c75da
MULTEXT-East Lexica Version 4 http://nl.ijs.si/ME/V4/ This directory contains the following files: 00README.txt This file Word-form lexica in MULTEXT format, with conditions on availability: wfl-bg.txt Bulgarian free wfl-cs.txt Czech free wfl-en.txt English free wfl-et.txt Estonian free wfl-fr.txt French free wfl-hu.txt Hungarian free wfl-ro.txt Romanian free wfl-sk.txt Slovak free wfl-sl-rozaj.txt Resian (sl dialect) free wfl-sl.txt Slovene free wfl-uk.txt Ukrainian free Separate submission: wfl-fa.txt Farsi/Persian license for research use only wfl-mk.txt Macedonian license for research use only wfl-pl.txt Polish license for research use only wfl-ru.txt Russ . . .