yob/pdf-reader

undefined method `objects' for #<PDF::Reader::FormXObject

vnazarenko opened this issue · 4 comments

I'm trying to extract images from pdf file, and use code from examples section, and i'm getting this error:

/Users/vnazarenko/.rvm/gems/ruby-2.7.7/gems/pdf-reader-2.11.0/lib/pdf/reader/form_xobject.rb:33:in `initialize': undefined method `objects' for #<PDF::Reader::FormXObject:0x0000000107597f60> (NoMethodError)
Did you mean?  xobjects
               xobject

in this line:

        when :Form then
          count = process_page(PDF::Reader::FormXObject.new(page, stream), count)
        end

What should i do?

yob commented

That's a surprising error! Is the page variable in PDF::Reader::FormXObject.new(page, stream) definitely a PDF::Reader:Page instance?

yes, and it has xobjects but not objects

yob commented

Could you make a minimal reproduction? I'm not sure how else I can debug it from here.

Also, it it only happening on a particular PDF file or all PDFs?

I just took code from example:

    def process_page(page, count)
      xobjects = page.xobjects
      return count if xobjects.empty?

      xobjects.each do |name, stream|
        case stream.hash[:Subtype]
        when :Image then
          count += 1

          case stream.hash[:Filter]
          when :CCITTFaxDecode then
            ExtractImages::Tiff.new(stream).save("#{page.number}-#{count}-#{name}.tif")
          when :DCTDecode      then
            ExtractImages::Jpg.new(stream).save("#{page.number}-#{count}-#{name}.jpg")
          else
            ExtractImages::Raw.new(stream).save("#{page.number}-#{count}-#{name}.tif")
          end
        when :Form then
          count = process_page(PDF::Reader::FormXObject.new(page, stream), count)
        end
      end
      count
    end

with something like:

    reader = PDF::Reader.new(@pdf_file_name)
    reader.pages.each_with_index do |page, idx|
       process_page(page, 0)
       # some code
   end

And got this error. I tried with 1 pdf only, because i had task to extract images from there. But i think problem is not in pdf, problem is in example code, if page is :Form