paulirish/speedline

Improve SSIM implementation

Opened this issue · 2 comments

I was interested in how the SSIM we consume (image-ssim) is different than that of the reference implementation.

visualmetrics uses https://github.com/jterrace/pyssim which points to https://ece.uwaterloo.ca/~z70wang/research/ssim/ .. the reference table there with einstein has the last SSIM at 0.662
However our module reports (via http://darosh.github.io/image-ssim-js/test/browser_test.html) this pair as 0.741.
Also the implementation looks fairly different than most of the other SSIM modules I'm looking at.

There are two new modules for SSIM, I'm seeing: https://github.com/obartra/ssim and https://github.com/IonicaBizau/img-ssim

However, both of them are unattractive in their current state.

Regardless, I'd like to still keep this issue open to explore changing our SSIM dependency.

This is a very interesting/valid question. How could we evaluate or compare these different SSIM implementations? Looking at 1-2 videos isn't sufficiently convincing per se (choice of webpages and sampling issues will complicate things).

One potential idea would be to take all the WPT videos from SpeedPerception study, and compute all different variations and see which variation explains the human judgment the best. Such a comparison makes a lot more sense using the Phase-2 data (that's currently live) because we included both Alexa + InternetRetailer top-K lists along with both mobile and desktop rendering modes for comparisons. In Phase-1 deep-dive analysis, we already compared several different metrics - so adding different variations of SSIM into the mix is fairly easy to do. Let me know if you see any problems with this line of thought.