Japanese characters recognition - incorrect output for some characters
Closed this issue · 2 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1.Run the tesseract for the attached files below
2.
3.
What is the expected output? What do you see instead?
No error in OCR output file i.e all image characters read properly
What version of the product are you using? On what operating system?
Tesseract 3.02.02
OS: Windows 7
Please provide any additional information below.
A few hiragana characters are read in 2 blocks of characters instead of 1.
For instance
1. ぽ read as ほま
2. ぷ read as ふて
3. ぶ read as ふご
I have created traindata only for hiragana characters, just to begin with. I
would like find a solution to this problem before I start Kanji.
Thanks for your time and support.
Original issue reported on code.google.com by sivakuma...@gmail.com
on 22 Jan 2015 at 5:11
Attachments:
GoogleCodeExporter commented
Attaching the result file
Original comment by sivakuma...@gmail.com
on 22 Jan 2015 at 5:13
Attachments:
GoogleCodeExporter commented
I am sorry, but we provide support only for language data files released by
this project (e.g. not for custom training). hir.traineddata was not
created/released by tesseract-ocr
Original comment by zde...@gmail.com
on 7 Feb 2015 at 7:51
- Changed state: WontFix