Corpus of comma placement Vejica 1.3

Name: Corpus of comma placement Vejica 1.3
License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Holozan, Peter

dc.contributor.author	Holozan, Peter
dc.date.accessioned	2018-04-15T08:09:08Z
dc.date.available	2018-04-15T08:09:08Z
dc.date.issued	2018-04-15
dc.identifier.uri	http://hdl.handle.net/11356/1185
dc.description	A collection of sentences demonstrating and correcting comma usage. The sentences come from five sources: - KUST: a Slovene learner corpus, https://nl.ijs.si/isjt06/proc/26_Stritar.pdf - Šolar: a corpus of student writing, http://www.slovenscina.eu/korpusi/solar - Lektor: a corpus of proof-reading corrections, http://www.slovenscina.eu/korpusi/lektor - Wikipedija: https://sl.wikipedia.org/wiki/Glavna_stran - Janes: Tweet comma corpus Janes-Vejica 1.0, http://hdl.handle.net/11356/1088 For Janes, the comma corrections from the source corpus were used. For Lektor, the comma corrections of proof-readers were used, and additional corrections added by Peter Holozan. For other texts, the comma errors were manually marked by Peter Holozan.
dc.language.iso	slv
dc.publisher	Amebis, d. o. o., Kamnik
dc.relation.isreferencedby	http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Holozan_Zbirka-primerov-rabe-vejice-Vejica-1-3.pdf
dc.relation.replaces	http://hdl.handle.net/11356/1055
dc.rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label	PUB
dc.source.uri	http://peter.amebis.si/vejica.html
dc.subject	comma placement
dc.subject	error annotation
dc.subject	manual annotation
dc.title	Corpus of comma placement Vejica 1.3
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN.SI data & tools
contact.person	Peter Holozan peter.holozan@amebis.si Amebis, d. o. o., Kamnik
size.info	104184 sentences
files.count	2
files.size	3981911

Files in this item

Download all files in item (3.8 MB)

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Name: README.txt
Size: 1.57 KB
Format: Text file
Description: Unknown
MD5: 44e2ee5cc966591d100d2a750bc0fd60

Download file Preview

File Preview

The Vejica corpus is stored as a tab delimited text file with two
columns: ID and one sentence.

The sentences are UTF-8 plain text with with "÷" (U+00F7) in place of
superfluous comma and "¤" (U+00A4) for missing comma.

The IDs encode the source of the sampled sentence and start as folows:
KUST.de.   - corpus KUST, first language German 
KUST.en.   - corpus KUST, first language English 
KUST.es.   - corpus KUST, first language Spanish 
KUST.it.   - corpus KUST, first language Italian 
KUST.sh.   - corpus KUST, first language Croatian, Serbinan or Bosnian 
Solar.G1.  - corpus Šolar, grammar school, 1st grade 
Solar.G2.  - corpus Šolar, grammar school, 2nd grade 
Solar.G3.  - corpus Šolar, grammar school, 3rd grade 
Solar.G4.  - corpus Šolar, grammar school, 4th grade 
Solar.OS6. - corpus Šolar, primary school, 6th grade 
Solar.OS7. - corpus Šolar, primary school, 7th grade 
Solar.OS8. - corpus Šolar, primary school, 8th grade 
Solar.OS9. - corpus Šolar, primary school, 9th grade 
Sola . . .

Name: vejica13.zip
Size: 3.8 MB
Format: application/zip
Description: Unknown
MD5: ddb65e98a7435718f80bb23591e2999d

Download file Preview

File Preview

- vejica13.txt11 MB

Show simple item record

Files in this item

Partners

Partners

Repository