MalformedPDFError Invalid filter algorithm 31
ollym opened this issue · 3 comments
PDF file:
EA9DDBD4F46B6A41F4CFC7FE3A222FAF8013C3CEAC0918D1E2A5.pdf
There seems to be some issue with png_depredict
function when running the code:
PDF::Reader.new(file).pages[0].xobjects[:I3].unfiltered_data
# => PDF::Reader::MalformedPDFError (Invalid filter algorithm 31):
That specific xobject is the QR Code which we're trying to extract and parse, but struggling to get the unfiltered_data necessary to do so. Will continue to try and debug but may need someone else's help
The image xobject looks like this:
<</Type /XObject
/Subtype /Image
/Width 100
/Height 100
/ColorSpace [/Indexed /DeviceRGB 1 23 0 R]
/BitsPerComponent 1
/Filter /FlateDecode
/DecodeParms <</Predictor 15 /Colors 1 /BitsPerComponent 1 /Columns 100>>
/Length 265>>
I'm fairly sure it's accurate that 31 isn't a valid filter type in the PNG format, but I suspect the png_depredict
isn't correctly parsing the data and it should be getting as far as thinking there's a filter type of 31. Maybe because it's a single bit per component? Or maybe because the colour space is indexed 🤔
Unfortunately I'm fairly swamped at the moment with day job and family life so I want be able to take a closer look for a while. Sorry!
Ouch, this has reminded me that there's only a single unit spec for the Flate filter with PNG shaped data 😬
pdf-reader/spec/reader/filter/flate_spec.rb
Lines 54 to 71 in 946559b
For those also having issues with this, we found HexaPDF was able to export the image correctly:
https://github.com/gettalong/hexapdf