| dc.contributor.author | Pori, Eva |
| dc.contributor.author | Knez, Mihaela |
| dc.contributor.author | Klemen, Matej |
| dc.contributor.author | Jerman, Tanja |
| dc.date.accessioned | 2025-11-18T12:21:22Z |
| dc.date.available | 2025-11-18T12:21:22Z |
| dc.date.issued | 2025-11-15 |
| dc.identifier.uri | http://hdl.handle.net/11356/2069 |
| dc.description | ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian, https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), CEFR level (based on the Core vocabulary for Slovenian as L2, organized by levels A1, A2, and B1; http://hdl.handle.net/11356/1697), confirmation of the CEFR level (based on expert validation), as well as metadata including information about the semantic categorization with detailed descriptions of each semantic category (metatopic, topic, and subtopic) and the source of the word. The words are classified into up to three levels of hierarchically organised semantic categories: into 12 top-level categories, i.e. metatopics, and 23 topics, the latter further divided into 29 subtopics. All categories are described in more detail in the provided README file. The words in ONTEM 1.0 were sourced from the KUUS corpus (http://hdl.handle.net/11356/1696) which comprises 17 textbooks for Slovenian as a Second and Foreign Language and contains 520,796 words. From this corpus, 1,019 semantically and thematically diverse words were manually selected to represent different parts-of-speech and CEFR levels, with a primary focus on A1 and A2 textbook vocabulary, while also including higher-level words to build a robust hierarchically structured system with potential for future expansion. The ontology will be integrated into the Dictionary for Speakers of Slovene as a Second and Foreign Language – SLOGOST (https://lexonomy.cjvt.si/slovar-za-govorce-slovenscine-kot-drugega-in-tujega-jezika/). The dataset is available in CSV format, accompanied by a README document that describes its contents in more detail. |
| dc.language.iso | slv |
| dc.language.iso | eng |
| dc.publisher | Centre for Slovene as a Second and Foreign Language, University of Ljubljana |
| dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://www.clarin.si/info/services/projects/#Ontology_of_topics_for_Slovenian_as_a_Second_and_Foreign_Language |
| dc.subject | ontology |
| dc.subject | topic |
| dc.subject | Slovenian as L2 |
| dc.subject | Slovenian as second and foreign language |
| dc.subject | ONTEM |
| dc.subject | SLOGOST |
| dc.title | Ontology of topics for Slovenian as a second and foreign language ONTEM 1.0 |
| dc.type | lexicalConceptualResource |
| metashare.ResourceInfo#ContentInfo.detailedType | ontology |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN.SI data & tools |
| contact.person | Eva Pori eva.pori@ff.uni-lj.si Filozofska fakulteta, Univerza v Ljubljani |
| sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
| size.info | 1019 entries |
| files.count | 2 |
| files.size | 61740 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (60.29 KB)To je vnos
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Ime
- ONTEM-v1-DATA.csv
- Velikost
- 44.75 KB
- Format
- Datoteka CSV
- Opis
- Neznano
- MD5
- 4c5fea38ade5b8815f5aa399b73738d9
- Ime
- ONTEM-v1-README.txt
- Velikost
- 15.55 KB
- Format
- Besedilna datoteka
- Opis
- Neznano
- MD5
- 7d5d83188865f2b628002c40ad986f51
README – Ontology of Topics for Slovenian as a Second and Foreign Language ONTEM 1.0
The data in tabular format comprises 8 columns:
A: Lemma / Lema includes a list of 1,019 lemmas from the KUUS corpus.
B: Part-of-speech / Besedna vrsta provides information about the part-of-speech of the included words following the MULTEXT-East tagset for Slovenian (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html).
C: CEFR level / Raven SEJO provides information on the classification of lemmas according to the CEFR proficiency levels. The assignment is based on Core vocabulary for Slovenian as L2 (http://hdl.handle.net/11356/1697), which organises lexical items into levels A1, A2, and B1. If the lemma is not included in the Core vocabulary for Slovenian as L2, no information is provided in this column.
D: Confirmation of the CEFR level / Potrditev ravni SEJO indicates whether a lemma was validated as belonging to the A1 level. Specialists in Slovenian as a foreign and second language conducted in . . .