Prikaži enostavni zapis vnosa

 
dc.contributor.author Supej, Anka
dc.contributor.author Ulčar, Matej
dc.contributor.author Robnik-Šikonja, Marko
dc.contributor.author Pollak, Senja
dc.date.accessioned 2021-04-06T21:14:46Z
dc.date.available 2021-04-06T21:14:46Z
dc.date.issued 2020-09-24
dc.identifier.uri http://hdl.handle.net/11356/1347
dc.description The list of single-word occupations in Slovene is based on the Slovene Standard Classification of Occupations (https://www.uradni-list.si/glasilo-uradni-list-rs/vsebina?urlid=199728&stevilka=1641). The list includes 234 occupation pairs. For each occupation, it contains its masculine word form (e.g. fotograf), its possible synonym, its feminine equivalent (e.g. fotografka) and the corresponding synonym of the feminine form (e.g. fotografinja). The cases where no synonyms were added for a specific occupation are denoted with the label 0 (note that only synonyms with the same root are considered). Several conditions for inclusion or exclusion of an occupation to the list were applied: - Our list contains only single word occupation pairs, while the majority of the occupations in the aforementioned classification are multi-word expressions. - An occupation has to exist both in female and male grammatical gender (gender-neutral words such as pismonoša [en. postman] are not included in the list). - At least one of the variants of an occupation (masculine or feminine) occurs at least 500 times in the Corpus of Written Standard Slovene Gigafida 2.0. - The occupations that are also proper names in Slovene, e.g. kovač [en. blacksmith], were filtered out if in the Slovene Morphological Lexicon Sloleks 2.0 (Dobrovoljc et al., 2019) the proper name form exists. - Occupations that could be easily associated with a context unrelated to occupations (e.g. čarovnik/čarovnica [en. wizard/witch]) or where a male or female variant is a homograph of a common noun (e.g. detektivka [en. detective] also denotes a detective novel) were excluded from the final set of occupations. When a more established version of an occupation exists, we manually add a synonym with the same root (e.g. in the case of fotografka, an arguably more established fotografinja was added [en. photographer]). If the standard classification does not include the female (e.g. dramatik [en. playwright]) or the male version (e.g. prostitutka [en. prostitute]) of an occupation, the missing version is manually added if it exists and appears in Gigafida corpus (e.g. there are no established words for female and male versions of postrešček [en. porter] and hostesa [en. hostess]). The list of occupations can be used for different natural language processing tasks including evaluation of word embeddings models through analogies, which can point to bias in language use. If you use the dataset, please cite the following paper: SUPEJ, Anka, ULČAR, Matej, ROBNIK ŠIKONJA, Marko, POLLAK, Senja (2020). Primerjava slovenskih besednih vektorskih vložitev z vidika spola na analogijah poklicev. Zbornik konference Jezikovne tehnologije in digitalna humanistika / Proc. of the Conference on Language Technologies and Digital Humanities, p. 93-100.
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby http://nl.ijs.si/jtdh20/pdf/JT-DH_2020_Supej-et-al_Primerjava-slovenskih-besednih-vektorskih-vlozitev-z-vidika-spola-na-analogijah-poklicev.pdf
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://embeddia.eu/
dc.subject occupations
dc.subject word analogies
dc.subject gender
dc.title List of single-word male and female occupations in Slovenian
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Anka Supej a.supej@gmail.com Jožef Stefan Institute
contact.person Senja Pollak senja.pollak@ijs.si Jožef Stefan Institute
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
size.info 234 entries
files.count 1
files.size 6083


 Datoteke v tem vnosu

To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
Male_and_female_occupations_Slovene.csv
Velikost
5.94 KB
Format
Datoteka CSV
Opis
List of pairs of male and female occupations in Slovene
MD5
e8f6938cfddd479adb909753b472541f
 Prenesi datoteko

Prikaži enostavni zapis vnosa