Problem Recognizing Vertically Oriented Text

Question

Problem Recognizing Vertically Oriented Text

Closed this issue 9 years ago · 9 comments

Prior to and including version 3.0.1, gImageReader was able to OCR vertically oriented text (e.g. text from old Chinese/Japanese novels where they were printed vertically). A bug seemed to have been introduced starting from version 3.1 and I was no longer able to OCR such text. The program will just insist on OCR them as horizontal oriented text. Appreciate if you could look into this problem, thanks.

Answer 1 · 2015-08-08T14:13:24.000Z

Sorry I've mixed up the version. The bug seemed to have been introduced since version 3.0, and the program was working properly at version 0.9

Answer 2 · 2015-08-08T18:38:27.000Z

Hello

There isn't anything I explicitly changed with regard to this between 0.9 and 3.x.. However it should be possible to actually handle this correctly. Do I understand you correctly that your issue is that recognizing vertical text returns one character per line, instead of it being "flattened out" onto a single line?

Answer 3 · 2015-08-10T11:29:04.000Z

I was trying to recognize an entire block of text, with the Chinese characters arranged in a vertical fashion, meaning the lines of text are vertical. I tried Tesseract command line with the default "-psm 3" param and it was recognized properly. Let's say the text is like this (example 1):

A O
p r
p a
l n
e g
e

Version 0.9 and Tesseract command line will return correctly as follow:
Apple
Orange

But with 3.x I'll get the vertical text as (example 1) above.

Answer 4 · 2015-08-10T11:30:33.000Z

Btw the vertical text in (example 1) should line up as 2 vertical lines...

Answer 5 · 2015-08-10T11:31:01.000Z

Ok I'll look into it.

Answer 6 · 2015-08-10T11:44:52.000Z

Thanks. Attached is a sample text (in Chinese) if it was any help. The Tesseract command line to recognize it is:
tesseract text,jpg test -l chi_tra

Answer 7 · 2015-08-12T06:40:48.000Z

Hello

Is

他馬上精砷起來
_走到了廳中。

果然不出所料'
很迷人。

the expected output for this?

Answer 8 · 2015-08-15T01:53:01.000Z

yes the result is correct

Answer 9 · 2015-08-23T13:08:03.000Z

This will be fixed in the upcoming release (probably by the end of the month).