etemesi254/zune-image

`zune-jpeg` 2x slower than `image` crate on my M1 MacBook

Closed this issue · 4 comments

emilk commented

I am really intrigued by zune-image based on its compilation time and boasted speed, but my first impression is that it is much slower than the image crate.

I am decoding 1440 × 1920 RGB JPEGs. image takes around 17 ms to decode a 1440 × 1920 RGB JPEG, while zune-jpeg takes around 33ms. I am NOT using the jpeg_rayon feature of image.

Here is one of the images that decodes slowly:

53

(the image comes from saving frames of videos from https://github.com/google-research-datasets/Objectron)

hi it's because our idct is not vectorized but jpeg decoder contains one, there was some effort to address that and there was a branch which achieved that , I was to manually merge it but time became an issue, and I don't have the hardware to test it

Merged neon optimized idct to dev branch which should improve decode times considerably for Arm Neon chips.

I can't really test by how much, but would love if you could report speeds on your side

emilk commented

I'm sorry, my original number were bad somehow. I've now done better measurements and I've found this:

  • image: 15.6 ms per jpeg
  • zune-jpeg 0.3.17: 10.3 ms per jpeg
  • zune-jpeg dev branch: 8.2 ms per jpeg

I also measured in Firefox when compiling to WASM:

  • image: 35 ms per jpeg
  • zune-jpeg 0.3.17: 29 ms per jpeg
  • zune-jpeg dev branch: 29 ms per jpeg

So: seems like I'm switching to zune-jpeg 🤘😃

good to hear that

WASM speeds can be improved with using wasm intrinsics, If I happen to have time I may implement it but no promises there :)