stlehmann/pdftools

pdftools merge fails for some PDFs

cjfp opened this issue · 1 comments

cjfp commented

When I try to merge a PDF of a Virgin Mobile phone bill, it crashes on Windows 7 / Cygwin.

$ pdftools merge -o test.pdf virgin.pdf
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 229, in new
return decimal.Decimal.new(cls, utils.str_(value), context)
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/pdftools", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/pdftools/_cli.py", line 274, in main
pdf_merge(ARGS.src, ARGS.output, ARGS.delete)
File "/usr/local/lib/python3.8/site-packages/pdftools/pdftools.py", line 42, in pdf_merge
writer.write(outputfile)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences
newobj = self._sweepIndirectReferences(externMap, newobj)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 579, in readFromStream
value = readObject(stream, pdf)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 92, in readObject
return NumberObject.readFromStream(stream)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 271, in readFromStream
return FloatObject(num)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 231, in new
return decimal.Decimal.new(cls, str(value))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

$ pip list
Package Version


pdftools 2.0.2
pip 21.3.1
PyPDF2 1.26.0
setuptools 59.1.1

If I go into Adobe, optimize the PDF, and save to a new file, then there are no problems. Do you have any suggestions about how to handle this from the command line? I wish I had a PDF to send without tons of private information.

@cjfp thanks for reporting. As pdftools is just a CLI for PyPDF2 I suggest you try updating PyPDF2 to the newest version and see if this solves the problem.