No OCR happening after importing PDF

Question

No OCR happening after importing PDF

Closed this issue 23 days ago · 6 comments

neo269 commented 25 days ago

Describe the bug
No OCR happening after importing Document

To Reproduce
Steps to reproduce the behavior:

Go to 'Import'
Click on 'PDF'
PDF gets imported
Go to 'Save PDF'
After Saving PDF When opened, dont see OCR'd PDF. Its Saves same PDF as imported

OCR Language is Gujarati

Expected behavior
Expected an OCRd PDF

Desktop (please complete the following information):

OS: Windows 11
Version 7.5.1

Additional context
Trying since two days but not able to get OCR working on my pdf. Attaching just one page if someone can help me out
test.pdf

Answer 1 · 2024-09-21T15:56:22.000Z

NAPS2 won't do OCR on a page that already has text (in this case the header/footer).

Answer 2 · 2024-09-21T16:11:39.000Z

NAPS2 won't do OCR on a page that already has text (in this case the header/footer).

So if I crop the pdf of header & footer, will OCR work ?

Answer 3 · 2024-09-21T16:20:25.000Z

Well, technically any change you do in NAPS2 (even just cropping 1 pixel) will rasterize the PDF (turning it fully into an image without any text or anything) which will let OCR work, though it might change the PDF quality/file size. Or if you had a PDF editor that could just remove the text that would work too.

Answer 4 · 2024-09-21T16:40:55.000Z

Well, technically any change you do in NAPS2 (even just cropping 1 pixel) will rasterize the PDF (turning it fully into an image without any text or anything) which will let OCR work, though it might change the PDF quality/file size. Or if you had a PDF editor that could just remove the text that would work too.

Thanks for the insight.
Apparently I just tried to OCR an English Language PDF from the net & it did work
However, when I tried your suggestion with the cropped PDF for Gujarati Language, it did not work. (cropped used Acrobat)

Attaching both PDFs if you need to test.

english_sample.pdf
gujarati_cropped_sample.pdf

Edit:
If PDF is saved as an image & then imported & saved to PDF, OCR Works!

Answer 5 · 2024-09-21T18:59:19.000Z

Whatever application you used to crop the header/footer left the text there, just off page - it would need to be fully deleted. But yes, converting to an image should work (or doing the crop in NAPS2 etc).

Answer 6 · 2024-09-22T09:33:53.000Z

Whatever application you used to crop the header/footer left the text there, just off page - it would need to be fully deleted. But yes, converting to an image should work (or doing the crop in NAPS2 etc).

Thanks for all the insights. Especially your first comment! OCR is now working for all my PDFs.