corkami/pocs

pdf.py: "warning: trying to repair broken xref", and "error: cannot recognize xref format"

sanderjo opened this issue · 1 comments

With pdf.py, on each PDF I've tried so far, I get errors like below. Tips how to solve this?

For example

$ python pdf.py bezuidenhout.pdf vakantieoverzicht.pdf 
error: expected 'obj' keyword (22 222 ?)
warning: trying to repair broken xref
warning: repairing PDF document

and (with the provided exampled PDF's):

sander@sammie:~/git/pocs/collisions/scripts$ python pdf.py ../examples/poeMD5_A.pdf ../examples/poeMD5_B.pdf 
warning: PDF stream Length incorrect
warning: PDF stream Length incorrect
warning: PDF stream Length incorrect
warning: ... repeated 2 times ...
error: cannot recognize xref format
warning: trying to repair broken xref
warning: repairing PDF document

Installed mutool version is 1.14, so uptodate:

$ apt list --installed | grep mupdf
mupdf-tools/cosmic,now 1.14.0-0build1+ubuntu18.10 amd64 [installed]
mupdf/cosmic,now 1.14.0-0build1+ubuntu18.10 amd64 [installed]
angea commented

It's normal that errors appear since the PDF was manipulated in an ugly way.
(that's why it's called hacked.pdf)

It will always say something like:

error: cannot recognize xref format
warning: trying to repair broken xref
warning: repairing PDF document

# (yes, errors will appear)
os.system('mutool clean -gggg hacked.pdf cleaned.pdf')