dc.contributor.author | Pollak, Senja |
dc.contributor.author | Vulić, Ivan |
dc.contributor.author | Pelicon, Andraž |
dc.contributor.author | Repar, Andraž |
dc.contributor.author | Armendariz, Carlos |
dc.contributor.author | Matthew, Purver |
dc.contributor.author | Ljubešić, Nikola |
dc.date.accessioned | 2021-03-09T08:38:40Z |
dc.date.available | 2021-03-09T08:38:40Z |
dc.date.issued | 2020-05-15 |
dc.identifier.uri | http://hdl.handle.net/11356/1309 |
dc.description | The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators independently, and next, for the examples where the translations differed, the final translations were chosen in a consensus meeting. The translators had also access to Croatian Simlex-999 translations (Mrkšić et al. 2017) and received translation guidelines (see next sheet) inspired by guidelines of Multi-SimLex (Vulić et al. 2020). The resources was used for building the CoSimLex resource (Armendariz et al. 2020). The list contains English original pair of words (Word1 and Word2), their part-of-speech, followed by Slovene translations (Trans1 and Trans2). The last column Comment relates to special cases: - "multiword_translation" -> translators were asked to opt for single-word equivalents, in some cases the only appropriate translation was a multi-word expression (for example, "birthday" -> "rojstni dan"). - "no_translation" -> pairs without a proper translation, i.e. translation pair contains two identical words. Although the translators were asked to find two different translations for the words, in a few examples that was not possible. For example, for the English pair "taxi" and "cab", only "taksi" was considered a good Slovene equivalent. - "duplicated_translation" -> in cases where a pair of words is repeated for two different English original pairs, both occurrences are marked as duplicate translations. - "duplicated_original" -> in one case, the original word pair was a duplicate, which is also marked. Cite: If you use the dataset, please cite the Clarin handle and the following paper: Armendariz, Carlos Santos, Purver, Matthew, Ulčar, Matej, Pollak, Senja, Ljubešić, Nikola, Granroth-Wilding, Mark, and Vaik, Kristiina (2020). CoSimLex: A Resource for Evaluating Graded Word Similarity in Context. In Proceedings of the 12th Language Resources and Evaluation Conference, p. 5878--5886. https://www.aclweb.org/anthology/2020.lrec-1.720/ References: Armendariz, Carlos Santos, Purver, Matthew, Ulčar, Matej, Pollak, Senja, Ljubešić, Nikola, Granroth-Wilding, Mark, and Vaik, Kristiina (2020). CoSimLex: A Resource for Evaluating Graded Word Similarity in Context. In Proceedings of the 12th Language Resources and Evaluation Conference, p. 5878--5886. https://www.aclweb.org/anthology/2020.lrec-1.720/ Hill, F., Reichart, R., and Korhonen, A. (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4):665–695. https://www.aclweb.org/anthology/J15-4004/ Mrkšić, Nikola, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, and Steve Young. (2017). Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints. Transactions of the ACL, 5:309–324. https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00063 Vulić, Ivan, Baker, Simon, Ponti, Edoardo Maria, Petti, Ulla, Leviant, Ira, Wing, Kelly, Majewska, Olga, Bar, Eden, Malone, Matt, Poibeau, Thierry, Reichart, Roi and Anna Korhonen (2020). Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity. Computational Linguistics. https://doi.org/10.1162/coli_a_00391 |
dc.language.iso | slv |
dc.language.iso | eng |
dc.publisher | University of Ljubljana |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/825153 |
dc.relation.isreferencedby | https://www.aclweb.org/anthology/2020.lrec-1.720/ |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | http://embeddia.eu/ |
dc.subject | similarity |
dc.subject | word embeddings |
dc.subject | evaluation |
dc.title | SimLex-999 Slovenian translation SimLex-999-sl 1.0 |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | other |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Senja Pollak senja.pollak@ijs.s University of Ljubljana |
sponsor | European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153 |
size.info | 999 entries |
files.count | 3 |
files.size | 38193 |
Files in this item
Download all files in item (37.3 KB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- SimLex-999_Slovene.csv
- Size
- 31.57 KB
- Format
- CSV file
- Description
- Slovene translation of the pairs of words in SimLex-999
- MD5
- 2ceccfa2f3847c7f9941a53b907989aa

- Name
- description_resource.txt
- Size
- 3.43 KB
- Format
- Text file
- Description
- Text file with the description of the resource
- MD5
- 1230123664a7654fbb5d89a4e988e57c
Slovene SimLex-999 (Pollak, Senja ; Vulić, Ivan; Pelicon, Andraž ; Repar, Andraž; Armendariz, Carlos ; Matthew, Purver; Ljubešić, Nikola) Description of the resource: The list contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators independently, and next, for the examples where the translations differed, the final translations were chosen in a consensus meeting. The translators had also access to Croatian Simlex-999 translations (Mrkšić et al. 2017) and received translation guidelines (see next sheet) inspired by guidelines of Multi-SimLex (Vulić et al. 2020). The resources was used for building the CoSimLex resource (Armendariz et al. 2020). The list contains English original pair of words (Word1 and Word2), their part-of-speech, followed by Slovene translations (Trans1 and Trans2). The last column Comment relates to special cases: - "multiword_translation" -> translators . . .

- Name
- guidelines_translators.txt
- Size
- 2.29 KB
- Format
- Text file
- Description
- Text file with the instructions to the translators
- MD5
- 95745f91bd0ef3cb5fbe87f1dc0a03dd
Guidelines for translators: a) Guidelines for individual translations: For the first set of translations performed by each translator separately, the guidelines were as follows: - It is not obligatory to use the same target translation for the same source word in different word pairs. - Flag any difficult translations, and comment in general if you have doubts or any remarks - If you cannot find two distinct words for source and target word, please flag it as difficult, and add a comment - If you find yourself using identical translation for two pairs, please add a comment - You can use the provided Croatian translation as an additional source - if gender is not marked in English (e.g. cat), try to pick the most natural one in Slovene, and if there is no clear gender interpretation, follow the Croatian one - Translate word pairs (and not single words): constituent words in a pair act as a disambiguation signal. Regarding multiple senses, try to pick the one that is most true to the . . .