Show simple item record

 
dc.contributor.author Pori, Eva
dc.contributor.author Knez, Mihaela
dc.contributor.author Klemen, Matej
dc.contributor.author Jerman, Tanja
dc.date.accessioned 2025-11-18T12:21:22Z
dc.date.available 2025-11-18T12:21:22Z
dc.date.issued 2025-11-15
dc.identifier.uri http://hdl.handle.net/11356/2069
dc.description ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian, https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), CEFR level (based on the Core vocabulary for Slovenian as L2, organized by levels A1, A2, and B1; http://hdl.handle.net/11356/1697), confirmation of the CEFR level (based on expert validation), as well as metadata including information about the semantic categorization with detailed descriptions of each semantic category (metatopic, topic, and subtopic) and the source of the word. The words are classified into up to three levels of hierarchically organised semantic categories: into 12 top-level categories, i.e. metatopics, and 23 topics, the latter further divided into 29 subtopics. All categories are described in more detail in the provided README file. The words in ONTEM 1.0 were sourced from the KUUS corpus (http://hdl.handle.net/11356/1696) which comprises 17 textbooks for Slovenian as a Second and Foreign Language and contains 520,796 words. From this corpus, 1,019 semantically and thematically diverse words were manually selected to represent different parts-of-speech and CEFR levels, with a primary focus on A1 and A2 textbook vocabulary, while also including higher-level words to build a robust hierarchically structured system with potential for future expansion. The ontology will be integrated into the Dictionary for Speakers of Slovene as a Second and Foreign Language – SLOGOST (https://lexonomy.cjvt.si/slovar-za-govorce-slovenscine-kot-drugega-in-tujega-jezika/). The dataset is available in CSV format, accompanied by a README document that describes its contents in more detail.
dc.language.iso slv
dc.language.iso eng
dc.publisher Centre for Slovene as a Second and Foreign Language, University of Ljubljana
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.si/info/services/projects/#Ontology_of_topics_for_Slovenian_as_a_Second_and_Foreign_Language
dc.subject ontology
dc.subject topic
dc.subject Slovenian as L2
dc.subject Slovenian as second and foreign language
dc.subject ONTEM
dc.subject SLOGOST
dc.title Ontology of topics for Slovenian as a second and foreign language ONTEM 1.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType ontology
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Eva Pori eva.pori@ff.uni-lj.si Filozofska fakulteta, Univerza v Ljubljani
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 1019 entries
files.count 2
files.size 61740


 Files in this item

 Download all files in item (60.29 KB)
Icon
Name
ONTEM-v1-DATA.csv
Size
44.75 KB
Format
CSV file
Description
Neznano
MD5
4c5fea38ade5b8815f5aa399b73738d9
 Download file
Icon
Name
ONTEM-v1-README.txt
Size
15.55 KB
Format
Text file
Description
Neznano
MD5
7d5d83188865f2b628002c40ad986f51
 Download file  Preview
 File Preview  
README – Ontology of Topics for Slovenian as a Second and Foreign Language ONTEM 1.0
The data in tabular format comprises 8 columns:
A: Lemma / Lema includes a list of 1,019 lemmas from the KUUS corpus. 
B: Part-of-speech / Besedna vrsta provides information about the part-of-speech of the included words following the MULTEXT-East tagset for Slovenian (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html).
C: CEFR level / Raven SEJO provides information on the classification of lemmas according to the CEFR proficiency levels. The assignment is based on Core vocabulary for Slovenian as L2 (http://hdl.handle.net/11356/1697), which organises lexical items into levels A1, A2, and B1. If the lemma is not included in the Core vocabulary for Slovenian as L2, no information is provided in this column.
D: Confirmation of the CEFR level / Potrditev ravni SEJO indicates whether a lemma was validated as belonging to the A1 level. Specialists in Slovenian as a foreign and second language conducted in . . .
                                            

Show simple item record