toy/image_size

undefined method `unpack' for nil:NilClass while fetching TIFF image size from partial data

Closed this issue · 3 comments

When trying to get TIFF image size from chunks of data, I end-up with the following exception:

Traceback (most recent call last):
        5: from bin/console:14:in `<main>'
        4: from (irb):17
        3: from (irb):17:in `new'
        2: from /Users/gottfrois/.rvm/gems/ruby-2.6.6/gems/image_size-2.0.2/lib/image_size.rb:65:in `initialize'
        1: from /Users/gottfrois/.rvm/gems/ruby-2.6.6/gems/image_size-2.0.2/lib/image_size.rb:227:in `size_of_tiff'
NoMethodError (undefined method `unpack' for nil:NilClass)

Steps to reproduce:

2.6.6 :001 > require 'open-uri'
 => true
2.6.6 :002 > fd = URI.parse('https://effigis.com/wp-content/themes/effigis_2014/img/Airbus-Spot6-50cm-St-Benoit-du-Lac-Quebec-2014-09-04.tif').open('rb')
 => #<Tempfile:/var/folders/_9/fzxprxqn2_10xdbghn1fb7q40000gn/T/open-uri20200724-10967-n9efp3>
2.6.6 :003 > ImageSize.new(fd.read(100))

The idea is to get the image size without reading the full image for performance reasons. But reading about the TIFF file header format, I wonder if it's even possible? As far as I understand, it's supposed to give you the offset where the image starts, which might be a large number which points to a nil value since all the data is not loaded...

Nevertheless, it might be a good idea to raise a ImageSize::FormatError in this case instead of raising an undefined method error. What do you think?

toy commented

Thank you for opening the issue, it was interesting to dig a bit deeper.
I've debugged what is read in the image to determine its size and unfortunately width and height information is at offsets 100_083_302 and 100_083_314 which is ~66kB from the end of the file. You are right, having meta information at the end of file is common for TIFF files. Second problem is that open-uri fetches everything, so small chunk is read from tempfile containing completely downloaded files.

About returning ImageSize::FormatError I certainly agree.

I wrote some quick'n'dirty code for getting tif size without downloading complete file. It brought also some ideas on improving reading local files.

require 'net/http'
require 'image_size'

class ImageSize
  remove_const(:ImageReader)
  
  class ImageReader
    def initialize(uri)
      raise ArgumentError, "expected instance of URI" unless uri.is_a?(URI)
      @uri = uri
      @http = Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https', keep_alive_timeout: 60)
      @chunks = {}
    end

    # def size
    #   @size ||= Integer(http.head(uri)['Content-Length'])
    # end

    CHUNK = 1024
    def [](offset, length)
      # Ignoring the fact that requested data can span multiple chunks
      chunk_number, chunk_offset = offset.divmod(CHUNK)
      @chunks[chunk_number] ||= @http.get(@uri, 'Range' => "bytes=#{chunk_number * CHUNK}-#{(chunk_number + 1) * CHUNK - 1}").body
      
      # Ignoring error handling
      @chunks[chunk_number][chunk_offset, length]
    end
  end
end

p ImageSize.new(URI('https://effigis.com/wp-content/themes/effigis_2014/img/Airbus-Spot6-50cm-St-Benoit-du-Lac-Quebec-2014-09-04.tif'))

Great feedback!
I'm currently using your gem in mine and I simply rescued the no method error exception for now, but this is a hack more than anything...

https://github.com/gottfrois/image_info/blob/master/lib/image_info/parser.rb#L32

Not sure what we can do here other than cleanly raising a format error and improve the codebase not to expect "something" to be present at index "offset" when someone might have partially loaded the image in the first place.

toy commented

@gottfrois Would be great if you can check wip branch.