Monolingual word analogy datasets are in 9 languages, each language in its own file, named xy-analogies.txt, where xy is 2-letter iso language code: - en - English, - et - Estonian, - fi - Finnish, - hr - Croatian, - lt - Lithuanian, - lv - Latvian, - ru - Russian, - sl - Slovenian, - sv - Swedish. Each dataset consists of 15 categories, first 5 are semantical categories, last 10 are syntactical categories. The beginning of a category is denoted by a line beginning with colon (:), followed by whitespace and category name, for example ": capitals-countries". Syntactical categories' names all begin with "gram". There is one entry per line, each entry has four words (ie. a pair of relations). Crosslingual word analogy datasets are the same as monolingual sets, except one pair in each entry is in one language, and the other pair in the same entry is in another language. For example, for dataset sl-et-analogies.txt, each entry has four words, the leftmost two form a relation in Slovenian, the rightmost two form an equivalent relation (not a translation) in Estonian.