The < character is recognized as K

Question

The < character is recognized as K

Closed this issue 5 years ago · 3 comments

Several times the < character is recognized as K, how can I improve this?

Answer 1 · 2019-07-31T10:00:16.000Z

Try using the "legacy" Tesseract version, it seems to be somewhat better on average.
In theory, training a dedicated OCR model for the OCR-B font should help as well, however I haven't personally had any reliable results on this field yet and the legacy recognizer that comes out of the box is still pretty good.

Secondly, the better is the quality of your image, the better the recognition.

Finally, it might be possible to implement some postprocessing of the text (e.g. force-convert K-s to <'s in the regions of text where we expect to have <'s), however this is not on my agenda at the moment.

Answer 2 · 2020-03-30T08:26:08.000Z

hi @konstantint
thanks for your work.

"Try using the "legacy" Tesseract version,"

How can I use the legacy version of the tesseract in the python app.

I am facing an accuracy issue. ( valid_score is 100 but the mrz value contains the inaccurate values)

please advise

Answer 3 · 2020-03-30T11:51:13.000Z

I found.

just passing the extra_cmdline_params='--oem 0') will enable the legacy mode

Am i right?

Thanks