Corpus of comma placement Vejica 1.0

Name: Corpus of comma placement Vejica 1.0
License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Holozan, Peter

Corpus of comma placement Vejica 1.0

CLARIN.SI data & tools

Avtorji: Holozan, Peter

Identifikator vnosa: http://hdl.handle.net/11356/1055

URL projekta: http://peter.amebis.si/vejica.html

Datum objave: 2016-03-03

Vrsta: corpus, text

Velikost: 113309 sentences

Jezik(i): Slovenian

Opis: A collection of sentences demonstrating and correcting comma usage. The sentences come from four sources: - KUST: a Slovene learner corpus, https://nl.ijs.si/isjt06/proc/26_Stritar.pdf - Šolar: a corpus of student writing, http://www.slovenscina.eu/korpusi/solar - Lektor: a corpus of proof-reading corrections, http://www.slovenscina.eu/korpusi/lektor - Wikipedija: https://sl.wikipedia.org/wiki/Glavna_stran For Lektor, the comma corrections of proof-readers were used. For other texts, the comma errors were manually marked by Peter Holozan.

Izdajatelj: Amebis, d. o. o., Kamnik

Ključne besede: comma placement error annotation manual annotation

Zbirke: CLARIN.SI data & tools

Ta vnos je bil nadomeščen z novejšim.

http://hdl.handle.net/11356/1185

Prikaži polni zapis vnosa

Datoteke v tem vnosu

Prenesi vse datoteke v vnosu (4.01 MB)

To je vnos

Publicly Available

z licenco:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Ime: README.txt
Velikost: 1.53 KB
Format: Besedilna datoteka
Opis: Description of the format.
MD5: feaf50df36f2595ced352c0d45469ba8

Prenesi datoteko Predogled

Predogled datoteke

The Vejica corpus is stored as a tab delimited text file with two
columns: ID and one sentence.

The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of
superfluous comma and "¤" (U+00A4) for missing comma.

The IDs encode the source of the sampled sentence and start as folows:
KUST.de.   - corpus KUST, first language German 
KUST.en.   - corpus KUST, first language English 
KUST.es.   - corpus KUST, first language Spanish 
KUST.it.   - corpus KUST, first language Italian 
KUST.sh.   - corpus KUST, first language Croatian, Serbinan or Bosnian 
Solar.G1.  - corpus Šolar, grammar school, 1st grade 
Solar.G2.  - corpus Šolar, grammar school, 2nd grade 
Solar.G3.  - corpus Šolar, grammar school, 3rd grade 
Solar.G4.  - corpus Šolar, grammar school, 4th grade 
Solar.OS6. - corpus Šolar, primary school, 6th grade 
Solar.OS7. - corpus Šolar, primary school, 7th grade 
Solar.OS8. - corpus Šolar, primary school, 8th grade 
Solar.OS9. - corpus Šolar, primary school, 9th grade 
Sola . . .

Ime: vejica10.zip
Velikost: 4.01 MB
Format: application/zip
Opis: Tab delimited text file with two columns: ID and sentence.
MD5: db6b1a854660fe1f142f80e56e10e250

Prenesi datoteko Predogled

Predogled datoteke

- vejica10.txt11 MB

Corpus of comma placement Vejica 1.0

Datoteke v tem vnosu

Partnerji

Partnerji

Repozitorij