Files in this item
Download all files in item (6.08 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)




- Name
- monolingual_analogies.zip
- Size
- 716.43 KB
- Format
- application/zip
- Description
- 9 monolingual datasets
- MD5
- dc8300f78297f44ca4a2c7746b5b97e2

- Name
- crosslingual_analogies.zip
- Size
- 5.38 MB
- Format
- application/zip
- Description
- 72 cross-lingual datasets
- MD5
- b1f572cfad6f02a15ed5eab317e41723
- hr-fi-analogies.txt544 kB
- lv-sv-analogies.txt574 kB
- lt-en-analogies.txt568 kB
- hr-ru-analogies.txt769 kB
- ru-hr-analogies.txt769 kB
- et-fi-analogies.txt526 kB
- sv-sl-analogies.txt543 kB
- en-fi-analogies.txt539 kB
- sl-fi-analogies.txt555 kB
- lv-lt-analogies.txt612 kB
- et-ru-analogies.txt742 kB
- ru-sl-analogies.txt784 kB
- lt-fi-analogies.txt589 kB
- fi-ru-analogies.txt788 kB
- en-ru-analogies.txt768 kB
- lv-et-analogies.txt548 kB
- ru-sv-analogies.txt776 kB
- lv-en-analogies.txt564 kB
- sv-lv-analogies.txt574 kB
- et-hr-analogies.txt509 kB
- hr-sl-analogies.txt531 kB
- sl-ru-analogies.txt784 kB
- fi-hr-analogies.txt544 kB
- sv-lt-analogies.txt577 kB
- lt-ru-analogies.txt815 kB
- en-hr-analogies.txt522 kB
- sv-en-analogies.txt526 kB
- ru-lv-analogies.txt815 kB
- lt-hr-analogies.txt569 kB
- et-sl-analogies.txt519 kB
- hr-sv-analogies.txt531 kB
- ru-lt-analogies.txt815 kB
- fi-sl-analogies.txt555 kB
- lv-fi-analogies.txt585 kB
- sv-et-analogies.txt515 kB
- ru-en-analogies.txt768 kB
- sl-hr-analogies.txt531 kB
- en-sl-analogies.txt534 kB
- lt-sl-analogies.txt582 kB
- et-sv-analogies.txt515 kB
- hr-lv-analogies.txt567 kB
- ru-et-analogies.txt742 kB
- fi-sv-analogies.txt549 kB
- en-sv-analogies.txt526 kB
- et-lt-analogies.txt553 kB
- hr-lt-analogies.txt569 kB
- sl-sv-analogies.txt543 kB
- hr-en-analogies.txt522 kB
- lt-sv-analogies.txt577 kB
- et-lv-analogies.txt548 kB
- lv-ru-analogies.txt815 kB
- fi-lv-analogies.txt585 kB
- sv-fi-analogies.txt549 kB
- lv-hr-analogies.txt567 kB
- en-lv-analogies.txt564 kB
- hr-et-analogies.txt509 kB
- fi-lt-analogies.txt589 kB
- et-en-analogies.txt505 kB
- en-lt-analogies.txt568 kB
- fi-en-analogies.txt539 kB
- sv-ru-analogies.txt776 kB
- sl-lt-analogies.txt582 kB
- ru-fi-analogies.txt788 kB
- sl-lv-analogies.txt579 kB
- lv-sl-analogies.txt579 kB
- fi-et-analogies.txt526 kB
- lt-lv-analogies.txt612 kB
- en-et-analogies.txt505 kB
- sl-et-analogies.txt519 kB
- sl-en-analogies.txt534 kB
- lt-et-analogies.txt553 kB
- sv-hr-analogies.txt531 kB

- Name
- readme.txt
- Size
- 1.06 KB
- Format
- Text file
- Description
- Basic information on the structure of the files
- MD5
- 1d5f5a80fa93cbc53f98814d00602fb4
Monolingual word analogy datasets are in 9 languages, each language in its own file, named xy-analogies.txt, where xy is 2-letter iso language code: - en - English, - et - Estonian, - fi - Finnish, - hr - Croatian, - lt - Lithuanian, - lv - Latvian, - ru - Russian, - sl - Slovenian, - sv - Swedish. Each dataset consists of 15 categories, first 5 are semantical categories, last 10 are syntactical categories. The beginning of a category is denoted by a line beginning with colon (:), followed by whitespace and category name, for example ": capitals-countries". Syntactical categories' names all begin with "gram". There is one entry per line, each entry has four words (ie. a pair of relations). Crosslingual word analogy datasets are the same as monolingual sets, except one pair in each entry is in one language, and the other pair in the same entry is in another language. For example, for dataset sl-et-analogies.txt, each entry has four words, the leftmost two form a relation in Slovenian . . .