jpWaC-L (Japanese Web)
This action may take several minutes for large corpora, please wait.

jpWaC-L (Japanese Web)

Japanese Web texts with automatically assigned difficulty levels from 4 (very easy) to 0 (very difficult). PoS and lemma annotated with ChaSen. Crawl and annotation in 2007.

Counts
Tokens409030315
Words333194041
Sentences12488610
Documents49536
General info
Corpus description Document
LanguageJapanese
EncodingUTF-8
Compiled10/28/2017 18:35:29
Tagset Description
Lexicon sizes
word
lempos
tag
ctag
level
lc
lemma
lemma_lc