cyanfish/naps2

PDFs generated by NAPS2 are quite difficult (and inconvenient) to compress

Closed this issue · 1 comments

Hello,

At work, I usually scan exams that consist of printed text that include maths (the questions) and handwritten text (the answers). I usually use the Sharp photocopier at work (which uses its built-in firmware: no computer connected), and I would like to comment how it compares to scanning at home with NAPS2 (using an HP PSC printer as the scanner).

If I scan a 10 page exam (as described above) at a 200 dpi resolution, in colour, both the Sharp photocopier and NAPS2 return a PDF with a size of around 4 MB (maybe slightly smaller with NAPS2). The next step I usually follow is to OCR and compress the PDF (using the medium compression) on the website Ave PDF (I have no relation with it at all: I am not spamming) The results are:

  1. Using a PDF scanned with the Sharp photocopier as the input: Ave PDF compresses it to a file of approximately 650 kB, and the printed text is still quite sharp.
  2. Using a PDF scanned with NAPS2 as the input: Ave PDF compresses it to a much larger file (around 2 MB if I remember correctly), and the printed text can be read but is blurry.
  3. I tried compressing the PDF scanned with NAPS2 on other websites such as: I love PDF, Adobe, PDF24, etc (I am not related to any of them). The results did not improve… The most acceptable one was on PDF24, keeping the 200 dpi resolution and setting the image quality to 0: the output size was 1.5 MB, the printed text could be read but was blurry, and the colours were altered to some extend.

I have no idea of how the Sharp photocopier scans: whether it detects text and images, what encoding or compression it uses, etc. I do not know how Ave PDF compresses PDFs either.

What I can see is that NAPS2 is very inconvenient to scan documents with a certain number of pages (10 or greater), because the PDF has an enormous size, and it cannot be easily compressed.

Thank you in advance.

Regards.

There are a lot of different factors that go into compression (e.g. image quality, noise, "smart" changes like deleting empty parts of the page). #80 is a previous issue for this.