The Vejica corpus is stored as a tab delimited text file with two columns: ID and one sentence. The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of superfluous comma and "¤" (U+00A4) for missing comma. The IDs encode the source of the sampled sentence and start as folows: KUST.de. - corpus KUST, first language German KUST.en. - corpus KUST, first language English KUST.es. - corpus KUST, first language Spanish KUST.it. - corpus KUST, first language Italian KUST.sh. - corpus KUST, first language Croatian, Serbinan or Bosnian Solar.G1. - corpus Šolar, grammar school, 1st grade Solar.G2. - corpus Šolar, grammar school, 2nd grade Solar.G3. - corpus Šolar, grammar school, 3rd grade Solar.G4. - corpus Šolar, grammar school, 4th grade Solar.OS6. - corpus Šolar, primary school, 6th grade Solar.OS7. - corpus Šolar, primary school, 7th grade Solar.OS8. - corpus Šolar, primary school, 8th grade Solar.OS9. - corpus Šolar, primary school, 9th grade Solar.PS1. - corpus Šolar, vocational school, 1st grade Solar.PS2. - corpus Šolar, vocational school, 2nd grade Solar.PS3. - corpus Šolar, vocational school, 3rd grade Solar.PS5. - corpus Šolar, vocational school, 5th grade Solar.SS1. - corpus Šolar, technical school, 1st grade Solar.SS2. - corpus Šolar, technical school, 2nd grade Solar.SS3. - corpus Šolar, technical school, 3rd grade Solar.SS4. - corpus Šolar, technical school, 4th grade Solar.MT. - corpus Šolar, matura course Lektor.Lektor. - corpus Lektor Wiki.Wiki. - Wikipedia Janes.Janes - corpus Janes-Vejica 1.0