hr500k v1.0
This action may take several minutes for large corpora, please wait.

hr500k v1.0

Manually annotated Croatian corpus hr500k v1.0 (morphosyntax, syntax, named entities, semantic roles)

Counts
Tokens506457
Words436725
Sentences24794
Documents900
General info
Corpus description Document
LanguageCroatian
EncodingUTF-8
Compiled04/13/2018 20:06:10
Tagset Description
Lexicon sizes
word73548
id 506457
lempos 36679
tag 768
feats 1907
se_dep 42
se_dep_head_id 72073
se_dep_head_lemma 11848
se_dep_head_tag 455
ud_dep 16
ud_dep_head_id 93273
ud_dep_head_lemma 14224
ud_dep_head_tag 566
lc 66797
norm 66797
lemma34321
lemma_lc 33244
Tags legend
NounN.*
Noun properNp.*
Noun commonNc.*
VerbV.*
AdjectiveA.*
PronounP.*
AdverbR.*
PrepositionS.*
ConjunctionC.*
NumeralM.*
ParticleQ.*
InterjectionI.*
AbbreviationY.*
ResidualX.*
Lempos suffixes
Noun-n
Verb-v
Adjective-a
Pronoun-p
Adverb-r
Preposition-s
Conjunction-c
Numeral-m
Particle-q
Interjection-i
Abbreviation-y
Residual-x

Structures and attributes