Show simple item record

 
dc.contributor.author Krek, Simon
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Robnik-Šikonja, Marko
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Gantar, Polona
dc.contributor.author Čibej, Jaka
dc.contributor.author Gorjanc, Vojko
dc.contributor.author Klemenc, Bojan
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Pori, Eva
dc.contributor.author Roblek, Rebeka
dc.contributor.author Zgaga, Karolina
dc.date.accessioned 2024-02-06T11:37:19Z
dc.date.available 2024-02-06T11:37:19Z
dc.date.issued 2023-11-15
dc.identifier.uri http://hdl.handle.net/11356/1916
dc.description Thesaurus of Modern Slovene is the largest automatically generated open-access collection of Slovene synonyms. It is sourced from the data in two principal language resources: The Oxford®-DZS Comprehensive English-Slovenian Dictionary and the Gigafida 1.0 corpus of written Slovene. The links identified between synonyms were additionally confirmed using the Dictionary of Standard Slovenian Language (SSKJ). The data extraction and structure for the Thesaurus were based on the frequency and manner in which words co-occur in translation strings of the Oxford-DZS Dictionary. This information is the basis for discriminating between ‘core’ and ‘near’ synonyms, with ‘core’ synonyms exhibiting a greater connection to the keyword. In the following step, an approach combining balanced co-occurrence graphs and the Personal PageRank algorithm automatically divides the synonyms into subgroups and ranks them according to the degree of semantic relatedness to the keyword, as well as their frequency in language use. For the creation methodology, see Krek et al. (2017) in the provided references. The database includes dictionary entries: single- and multiword headwords, their part-of-speech and other linguistic features, as well as automatically extracted synonyms, their type (core or near) and relevancy rank. In version 2.0, 4,544 manually revised antonyms were added to the database. Additionally, for a part of the database, synonyms were distributed under the corresponding word senses. Pertaining to how much lexicographic revision was involved in their preparation, database entries can have one of the following three statuses: (a) ssss-automatic (96,064 entries): no manual revision was conducted; (b) ssss-manual (3,421 entries): word senses and semantic indicators were prepared by lexicographers, and synonyms were manually distributed under each corresponding sense; (c) ssss-hybrid (1,352 entries): manually revised senses are combined with data compiled automatically. For novelties of v2.0, see Arhar Holdt et al. (2023) in the provided references.
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://elex.link/elex2023/wp-content/uploads/82.pdf
dc.relation.isreferencedby https://elex.link/elex2017/wp-content/uploads/2017/09/paper05.pdf
dc.relation.replaces http://hdl.handle.net/11356/1166
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://viri.cjvt.si/sopomenke/eng/about
dc.subject thesaurus
dc.subject synonyms
dc.subject antonyms
dc.title Thesaurus of Modern Slovene 2.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType thesaurus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri http://viri.cjvt.si/sopomenke/eng
contact.person Simon Krek simon.krek@guest.arnes.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor University of Ljubljana P6-0215 Slovene Language - Basic, Contrastive, and Applied Studies nationalFunds
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
sponsor Republic of Slovenia, Ministry of Culture 3340-21-722002 Upgrading fundamental dictionary resources and databases of CJVT UL nationalFunds
sponsor Republic of Slovenia, Ministry of Culture JR-NPJP-22-23 Upgrading language portals at CJVT nationalFunds
size.info 100837 entries
files.count 1
files.size 11037589


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
CJVT-Thesaurus-of-Modern-Slovene-v2.zip
Size
10.53 MB
Format
application/zip
Description
Thesaurus of Modern Slovene 2.0 Database and Schema
MD5
2f6f0ffcdb4ba6b6e99a006becde514c
 Download file  Preview
 File Preview  

Show simple item record