Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.date.accessioned 2015-05-25T19:41:04Z
dc.date.available 2015-05-25T19:41:04Z
dc.date.issued 2014-09-13
dc.identifier.uri http://hdl.handle.net/11356/1032
dc.description The imp25k lexicon of historical Slovene was created automatically from the goo300k and foo3M annotated corpora and contains attested and manually verified word forms and their annotations with examples of use. A lexicon entry contains the modern lemma with its part-of-speech and, for archaic words, its gloss (closest modern equivalent(s) or short explanation of their meaning). The lemma is followed by its modern word forms from the corpus (i.e. the complete paradigm of the lemma is not given), and each of these has all its attested historical word forms with examples of usage. The lexicon is available in source TEI P5 XML and in the much smaller and simpler derived tabular format, which does not contain usage examples. In the latter, multi-word units are joined with the underscore. The 1st column is the word form, the 2nd its modern equivalent, the 3rd its modern lemma, 4th its PoS tag from the IMP morphosyntactic specification, and 5th (where present) the gloss, e.g.: ako_ravno<TAB>akoravno<TAB>akoravno<TAB>C<TAB>čeprav<LF> or ak-li<TAB>ako_li<TAB>ako_li<TAB>C_Q<TAB><LF>
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation info:eu-repo/grantAgreement/EC/FP7/215064
dc.relation.isreferencedby https://doi.org/10.1007/s10579-015-9294-7
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://nl.ijs.si/imp/index-en.html
dc.subject historical language
dc.subject modernisation
dc.subject lemmatisation
dc.subject TEI
dc.title Lexicon of historical Slovene imp25k 1.1
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://nl.ijs.si/imp/imp25k/html-s/
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor EU FP7-ICT-215064 IMPACT "Improving Access to Text" euFunds info:eu-repo/grantAgreement/EC/FP7/215064
sponsor Google Inc. Google research award Developing Language Models of Historical Slovene Other
size.info 28034 entries
files.count 2
files.size 26650978


 Files in this item

 Download all files in item (25.42 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
IMP-lexicon-tei.zip
Size
24.94 MB
Format
application/zip
Description
Source TEI P5
MD5
55ddb7a0939c6eda8b836bbdbf3deeda
 Download file  Preview
 File Preview  
  • IMP-lexicon
    • imp25k-header-sl.html120 kB
    • imp25k.xml166 MB
    • imp25k-header-en.html119 kB
    • schema
      • tei_imp.zip48 kB
      • tei_imp.rnc192 kB
      • tei_imp.dtd161 kB
      • tei_imp_doc.html2 MB
      • tei_imp_doc.pdf1 MB
      • tei_imp_schema.xml4 kB
      • tei_imp.rng421 kB
      • imp-page.dtd1 kB
    • 00README.txt2 kB
Icon
Name
IMP-lexicon-txt.zip
Size
487.24 KB
Format
application/zip
Description
Derived tabular format.
MD5
c23f34246ad609b1bfa32b463019f188
 Download file  Preview
 File Preview  
  • IMP-lexicon
    • imp25k.txt2 MB
    • imp25k-header-sl.html120 kB
    • imp25k-header-en.html119 kB
    • 00README.txt2 kB

Show simple item record