yob/pdf-reader

MalformedPDFError Invalid filter algorithm 31

ollym opened this issue · 3 comments

ollym commented

PDF file:
EA9DDBD4F46B6A41F4CFC7FE3A222FAF8013C3CEAC0918D1E2A5.pdf

There seems to be some issue with png_depredict function when running the code:

PDF::Reader.new(file).pages[0].xobjects[:I3].unfiltered_data

# => PDF::Reader::MalformedPDFError (Invalid filter algorithm 31):

That specific xobject is the QR Code which we're trying to extract and parse, but struggling to get the unfiltered_data necessary to do so. Will continue to try and debug but may need someone else's help

yob commented

The image xobject looks like this:

<</Type /XObject
/Subtype /Image
/Width 100
/Height 100
/ColorSpace [/Indexed /DeviceRGB 1 23 0 R]
/BitsPerComponent 1
/Filter /FlateDecode
/DecodeParms <</Predictor 15 /Colors 1 /BitsPerComponent 1 /Columns 100>>
/Length 265>>

I'm fairly sure it's accurate that 31 isn't a valid filter type in the PNG format, but I suspect the png_depredict isn't correctly parsing the data and it should be getting as far as thinking there's a filter type of 31. Maybe because it's a single bit per component? Or maybe because the colour space is indexed 🤔

Unfortunately I'm fairly swamped at the moment with day job and family life so I want be able to take a closer look for a while. Sorry!

yob commented

Ouch, this has reminded me that there's only a single unit spec for the Flate filter with PNG shaped data 😬

context "deflated stream with PNG predictors" do
let(:deflated_path) {
File.dirname(__FILE__) + "/../../data/deflated_with_predictors.dat"
}
let(:depredicted_path) {
File.dirname(__FILE__) + "/../../data/deflated_with_predictors_result.dat"
}
let(:deflated_data) { binread(deflated_path) }
let(:depredicted_data) { binread(depredicted_path) }
it "inflates the data" do
filter = PDF::Reader::Filter::Flate.new(
:Columns => 5,
:Predictor => 12
)
expect(filter.filter(deflated_data)).to eql(depredicted_data)
end
end

ollym commented

For those also having issues with this, we found HexaPDF was able to export the image correctly:
https://github.com/gettalong/hexapdf