keensoft/alfresco-simple-ocr

Spaces between characters in ocr'ed pdfs

DavBE opened this issue · 2 comments

DavBE commented

Hi,

Every PDF file I ocr in Alfresco contains spaces between each character.

Example : the word "client" becomes "c l i e n t".

Maybe it's a pdfsandwich issue but as it is called from alfresco-simple-ocr I though i would ask here. Is there anything I can do to solve this ?

System is Ubuntu 16.06 x64, running Alfresco 5.2.0 (re21f2be5-b22)

Regards,

David

If you are using pdfsandwich switch to OCRmyPDF.

Try also to make the transformation from command line to detect if the problem is inside or outside Alfresco.

DavBE commented

Hi Angel,

Thanks for your reply.

I ran pdfsandwich on the command line and the issue persisted. I installed OCRmyPDF and no more spaces!

Thank you very much.

David