tesseract-ocr/tessdata

symbolic languages like Chinese, Korean and Japanese needs to to be update

vsatyamesc opened this issue · 1 comments

symbolic languages like Chinese, Korean and Japanese needs to to be update because the old fonts are not used much anymore and there's some new character too

I'm interested in the Japanese and Chinese models. I have only done small scale training in the past (years ago) for Japanese to use locally and testing. Are there any good resources on how I can improve the models with additional fonts?
And for example, what to look at during the training?