jpWaC-L (Japanese Web)
This action may take several minutes for large corpora, please wait.

jpWaC-L (Japanese Web)

Japanese Web texts with automatically assigned difficulty levels from 4 (very easy) to 0 (very difficult). PoS and lemma annotated with ChaSen. Crawl and annotation in 2007.

Counts
Tokens409030315
Words333194041
Sentences12488610
Documents49536
General info
Corpus description Document
LanguageJapanese
EncodingUTF-8
Compiled10/28/2017 18:35:29
Tagset Description
Lexicon sizes
word606696
lempos575422
tag70
ctag70
level5
lc 590541
lemma574518
lemma_lc558363

Structures and attributes