dc.contributor.author | Erjavec, Tomaž |
dc.date.accessioned | 2015-05-25T19:41:04Z |
dc.date.available | 2015-05-25T19:41:04Z |
dc.date.issued | 2014-09-13 |
dc.identifier.uri | http://hdl.handle.net/11356/1032 |
dc.description | The imp25k lexicon of historical Slovene was created automatically from the goo300k and foo3M annotated corpora and contains attested and manually verified word forms and their annotations with examples of use. A lexicon entry contains the modern lemma with its part-of-speech and, for archaic words, its gloss (closest modern equivalent(s) or short explanation of their meaning). The lemma is followed by its modern word forms from the corpus (i.e. the complete paradigm of the lemma is not given), and each of these has all its attested historical word forms with examples of usage. The lexicon is available in source TEI P5 XML and in the much smaller and simpler derived tabular format, which does not contain usage examples. In the latter, multi-word units are joined with the underscore. The 1st column is the word form, the 2nd its modern equivalent, the 3rd its modern lemma, 4th its PoS tag from the IMP morphosyntactic specification, and 5th (where present) the gloss, e.g.: ako_ravno<TAB>akoravno<TAB>akoravno<TAB>C<TAB>čeprav<LF> or ak-li<TAB>ako_li<TAB>ako_li<TAB>C_Q<TAB><LF> |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.relation | info:eu-repo/grantAgreement/EC/FP7/215064 |
dc.relation.isreferencedby | https://doi.org/10.1007/s10579-015-9294-7 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://nl.ijs.si/imp/index-en.html |
dc.subject | historical language |
dc.subject | modernisation |
dc.subject | lemmatisation |
dc.subject | TEI |
dc.title | Lexicon of historical Slovene imp25k 1.1 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | lexicon |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
demo.uri | https://nl.ijs.si/imp/imp25k/html-s/ |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | EU FP7-ICT-215064 IMPACT "Improving Access to Text" euFunds info:eu-repo/grantAgreement/EC/FP7/215064 |
sponsor | Google Inc. Google research award Developing Language Models of Historical Slovene Other |
size.info | 28034 entries |
files.count | 2 |
files.size | 26650978 |
Files in this item
Download all files in item (25.42 MB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- IMP-lexicon-tei.zip
- Size
- 24.94 MB
- Format
- application/zip
- Description
- Source TEI P5
- MD5
- 55ddb7a0939c6eda8b836bbdbf3deeda
- IMP-lexicon
- imp25k-header-sl.html120 kB
- imp25k.xml166 MB
- imp25k-header-en.html119 kB
- schema
- tei_imp.zip48 kB
- tei_imp.rnc192 kB
- tei_imp.dtd161 kB
- tei_imp_doc.html2 MB
- tei_imp_doc.pdf1 MB
- tei_imp_schema.xml4 kB
- tei_imp.rng421 kB
- imp-page.dtd1 kB
- 00README.txt2 kB

- Name
- IMP-lexicon-txt.zip
- Size
- 487.24 KB
- Format
- application/zip
- Description
- Derived tabular format.
- MD5
- c23f34246ad609b1bfa32b463019f188
- IMP-lexicon
- imp25k.txt2 MB
- imp25k-header-sl.html120 kB
- imp25k-header-en.html119 kB
- 00README.txt2 kB