hr500k v1.0
This action may take several minutes for large corpora, please wait.

hr500k v1.0

Manually annotated Croatian corpus hr500k v1.0 (morphosyntax, syntax, named entities, semantic roles)

Counts
Tokens506457
Words436725
Sentences24794
Documents900
General info
Corpus description Document
LanguageCroatian
EncodingUTF-8
Compiled04/13/2018 20:06:10
Tagset Description
Lexicon sizes
word
id
lempos
tag
feats
se_dep
se_dep_head_id
se_dep_head_lemma
se_dep_head_tag
ud_dep
ud_dep_head_id
ud_dep_head_lemma
ud_dep_head_tag
lc
norm
lemma
lemma_lc
Tags legend
NounN.*
Noun properNp.*
Noun commonNc.*
VerbV.*
AdjectiveA.*
PronounP.*
AdverbR.*
PrepositionS.*
ConjunctionC.*
NumeralM.*
ParticleQ.*
InterjectionI.*
AbbreviationY.*
ResidualX.*
Lempos suffixes
Noun-n
Verb-v
Adjective-a
Pronoun-p
Adverb-r
Preposition-s
Conjunction-c
Numeral-m
Particle-q
Interjection-i
Abbreviation-y
Residual-x