Vertical writing systems are not handled correctly in gImageReader
Opened this issue · 0 comments
Vertical writing systems can be OCRed (fairly) reliably with the tesseract command-line tool, but will get garbled characters with gImageReader by default. Horizontal writing systems are not affected.
Here are some sample images (in chi_sim
, jpn
, chi_sim_vert
, jpn_vert
respectively):
Here are the results using tesseract:
(縦組み is not OCRed correctly, but that is not a big problem.)
Here is the result using gImageReader (taking jpn_vert
as an example):
I noticed that after rotating the image 90° counterclockwise, the result will be correct:
(and 縦組み is OCRed correctly!)
The issue has been reported in Issue #552, but it is mistakenly regarded as a bug in tessdata. Since the tesseract command-line tool can handle it correctly, it is definitely gImageReader's fault.
I'm using gImageReader 3.4.2 and tesseract 5.4.1 under Arch Linux, using the default tessdata provided by tesseract. I noticed that gImageReader says it is using tesseract 5.3.4 in the "About" dialog, so this might have something to do with the problem.