dc.contributor.author | Erjavec, Tomaž |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Fišer, Darja |
dc.date.accessioned | 2020-05-05T10:32:52Z |
dc.date.available | 2020-05-05T10:32:52Z |
dc.date.issued | 2020-05-05 |
dc.identifier.uri | http://hdl.handle.net/11356/1263 |
dc.description | KAS-biterm is an automatically generated glossary of English terms with their translations into Slovene. The pairs, possibly with their English and Slovene acronyms, were extracted from the Corpus of Academic Slovene KAS 1.0 (http://hdl.handle.net/11356/1244), where they have been annotated with the kas-biterm tool (https://github.com/clarinsi/kas-biterm) trained on the Bilingual terminology extraction dataset KAS-biterm 1.0 (http://hdl.handle.net/11356/1199). Note that only Query 1 was used for pre-selection of the sentences and for training the tool, and that the bi-lingual terms from the KAS corpus have been filtered to remove noise. The glossary is encoded in TEI-Lex0 (https://github.com/DARIAH-ERIC/lexicalresources) and gives, for each entry, also up to three examples of use, together with their bibliographic information. Various parts of the lexical entries also have links to the appropriate queries to CLARIN.SI noSketch Engine concordancer. The TEI encoded corpus is also available in a variant that is a much smaller document as it does not contain the examples of use and links. |
dc.language.iso | slv |
dc.language.iso | eng |
dc.publisher | Jožef Stefan Institute |
dc.relation.isreferencedby | http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Ljubesic-et-al_KAS-term-and-KAS-biterm-Datasets-and-baselines-for-monolingual-and-bilingual-terminology-extraction-from-academic-writing.pdf |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://nl.ijs.si/kas/ |
dc.subject | terminology |
dc.subject | PhD theses |
dc.subject | MSc/MA theses |
dc.subject | BSc/BA theses |
dc.subject | academic writing |
dc.subject | TEI |
dc.subject | scientific texts |
dc.title | English-Slovene term candidates KAS-biterm 1.0 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | terminologicalResource |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor | ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
sponsor | ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds |
size.info | 133710 entries |
files.count | 1 |
files.size | 53206123 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- KAS-biterm.TEI.zip
- Size
- 50.74 MB
- Format
- application/zip
- Description
- Glossary in TEI Lex0 format
- MD5
- b61a346c790ffada98232670cc8481db
- KAS-biterm.TEI
- kas-biterm.xml460 MB
- kas-biterm.nocit.xml79 MB
- Schema
- TEILex0.rng272 kB
- TEILex0-ODD.xml159 kB
- TEILex0.rnc119 kB
- TEILex0.dtd97 kB
- TEILex0.sch482 B
- 00README.txt535 B