elixir-image/image

Possible speedup for benchmark

jcupitt opened this issue · 3 comments

Hello, this project looks very interesting.

I don't know exlir, but I wonder if your small benchmark is hitting the libvips fast path?

https://github.com/kipcole9/image/blob/main/bench/vips_v_mogrify.exs

I'm not sure it's using sequential mode, and I don't think it's exploiting JPEG shrink-on-load (though I'm not certain, of course, sorry).

Here's a tiny benchmark in python to show the difference:

#!/usr/bin/python3

import sys
import time
import pyvips

print(f"random access, using thumbnail_image:")
start = time.time()

x = pyvips.Image.new_from_file(sys.argv[1])
x = x.thumbnail_image(256);
x.write_to_file(sys.argv[2])

end = time.time()
print(f"took {(end - start) * 1000:.2f}ms")

print(f"sequential access, using thumbnail:")
start = time.time()

x = pyvips.Image.thumbnail(sys.argv[1], 256)
x.write_to_file(sys.argv[2])

end = time.time()
print(f"took {(end - start) * 1000:.2f}ms")

I see:

$ ./bench.py ~/pics/nina.jpg x.jpg
random access, using thumbnail_image:
took 398.28ms
sequential access, using thumbnail:
took 39.30ms

Where nina.jpg is a 6k x 4k jpeg.

You'll see a large drop in memory use too. If I comment out the thumbnail version and just measure thumbnail_image, I see:

$ /usr/bin/time -f %M:%e ./bench.py ~/pics/nina.jpg x.jpg
random access, using thumbnail_image:
took 434.63ms
sequential access, using thumbnail:
took 0.00ms
235708:0.59

ie. a peak of 240mb of ram. If I comment out the thumbnail_image version and just time thumbnail, I see:

$ /usr/bin/time -f %M:%e ./bench.py ~/pics/nina.jpg x.jpg
random access, using thumbnail_image:
took 0.00ms
sequential access, using thumbnail:
took 35.46ms
54208:0.17

Peak memory use of 54mb.

If you're curious, there's a chapter in the docs explaining how the access mode flag works:

https://www.libvips.org/API/current/How-it-opens-files.md.html

And a page on the wiki about how thumbnail works:

https://github.com/libvips/libvips/wiki/HOWTO----Image-shrinking

We ought to move that into the main docs and update it.

@jcupitt Thanks very much for checking in. Surprised my small little project got your attention!

I have read the docs you mentioned (read as much as I can get my hands on) and those early benchmarks were before I understood the difference in behaviour with thumbnail and thumbnail_image. So now my library implementation makes it easy to use both (pass a file path, it uses thumbnail, pass an image it uses thumbnail_image.

I also default in my API for Image.open to access=sequential since that has the benefits your've outlined well. It feels like this is the most sane default for a library whose primary intent is streamed transformations. If you think that's a mistake, please let me know!

libvips is a amazing work, great craft. And its functional orientation, immutability and multi-threaded nature aligns really well with the Erlang BEAM VM upon which Elixir runs.

I'll revisit the benchmarks when I finish up some image streaming work and will update this issue with the results. Many thanks again.

--Kip

Hi Kip, sure, that all sounds good. So the lines:

      {:ok, image} = Image.open(image_path)
      {:ok, image} = Image.resize(image, 250)

      out_path = Temp.path!(suffix: ".jpg", basedir: temp_dir)
      :ok = Image.write(image, out_path)

Turn into thumbnail behind the scenes? That's very neat.

Yes, I started off in functional programming, so libvips is supposed to be a bit like Haskell (lazy, pure, memoization, etc.). I think you're the first person to say this!

The runtime on which this runs supports functions with the same name and arity but with different types of parameters. So in this case, Image.resize can be called with either an Image parameter or a string that is considered to be a pathname:

  # We land here if `resize` is called with an image (%Vimage{} is syntax to referring to
  # a data structure.
  def resize(%Vimage{} = image, width, options) when is_size(width) do
    with {:ok, options} <- Resize.validate_options(options) do
      Operation.thumbnail_image(image, width, options)
    end
  end

  # We land here if `resize` is called with a string. The `is_binary(image_path)` is
  # called a `guard clause` and the function is called only if the parameters 
  # meet the guard conditions. In this language a string is a subtype of a binary
  # type.
  def resize(image_path, width, options) when is_binary(image_path) and is_size(width) do
    with {:ok, options} <- Resize.validate_options(options),
         {:ok, _file} = file_exists?(image_path) do
      Operation.thumbnail(image_path, width, options)
    end
  end

I've taken up enough of your time, I'll close the issue with much thanks.