UB-Mannheim/tesseract

The eng.traineddata provided by default cannot be used for OSD

Closed this issue · 3 comments

Current Behavior: The default eng.traineddata is just 4017 kb and osd is not working for this.

Expected Behavior: The actual eng.traineddata from tesseract github is 23956kb and OSD is working as expected.

Suggested Fix: Please update the file.

The provided file is not faulty, but simply a model made for fast OCR, so fits for most users.

If you want to do OSD, either don't specify a language or - if you have to specify a language - get a traineddata file from https://github.com/tesseract-ocr/tessdata/.

Thanks for the reply Stweil,

I tried osd without specifying a language with passports and it was not giving me correct results. Please try once.

tp3
type3
type2

Indeed. Then you have to get the right eng.traineddata.