The imp25k lexicon of historical Slovene was created automatically from the goo300k and foo3M annotated corpora and contains attested and manually verified word forms and their annotations with examples of use. A lexicon entry contains the modern lemma with its part-of-speech and, for archaic words, its gloss (closest modern equivalent(s) or short explanation of their meaning). The lemma is followed by its modern word forms from the corpus (i.e. the complete paradigm of the lemma is not given), and each of these has all its attested historical word forms with examples of usage.
The lexicon is available in source TEI P5 XML and in the much smaller and simpler derived tabular format, which does not contain usage examples. In the latter, multi-word units are joined with the underscore. The 1st column is the word form, the 2nd its modern equivalent, the 3rd its modern lemma, 4th its PoS tag from the IMP morphosyntactic specification, and 5th (where present) the gloss, e.g.: