ElectricRCAircraftGuy/PDF2SearchablePDF

Add -c compression option

ElectricRCAircraftGuy opened this issue · 3 comments

pdf2searchablepdf -c file.pdf

Shall produce file_searchable-comp1.pdf.

Make it a wrapper around this: https://askubuntu.com/a/243753/327339.

See if you can specify multiple resolutions or do multiple passes for further compression.

Allow -c1 (same as -c), -c2 for more compression, and -c3 for most compression.

Update my readme to explain how to manually do this compression after-the-fact too!

And update help menu with these new options.


pdf2searchablepdf -c1 file.pdf # low compression only
pdf2searchablepdf -c2 file.pdf # medium compression only
pdf2searchablepdf -c3 file.pdf # high compression only

Default is to output them all?

file_searchable_1.pdf # low compression 
file_searchable_2.pdf # medium compression 
file_searchable_3.pdf # high compression 

Use Ghostscript after-the-fact, to do compression only, on an already-processed PDF.
See my ans: https://askubuntu.com/questions/113544/how-can-i-reduce-the-file-size-of-a-scanned-pdf-file/1303196#1303196

pdf2searchablepdf --compress-only=low    file_searchable_1.pdf
pdf2searchablepdf --compress-only=medium file_searchable_1.pdf
pdf2searchablepdf --compress-only=high   file_searchable_1.pdf

nah...use small, medium, large instead of low, medium, high.

maybe --size=small, etc.

TODO: The commit below partially fulfills this ticket.

  • I still need to add --size=small, --size=medium, and --size=large options.

Also:

  • Post-processing the PDF is a crude way to do it, but it's better than nothing. A better way to do it in the future would be to do OCR on the high-quality images and output the data to an intermediate format, then compress the images as desired and overlay the output OCR data onto the custom-compressed images. That will have to be future work.
    • Let's make that a new ticket: #27