getflywheel/local-addon-image-optimizer

Image Optimizer can't detect duplicate images

Opened this issue · 1 comments

What’s not working: if you upload the same image twice, image optimizer will only detect it once.

  • Reason: Images are considered unique based on hash of the image file content so if you upload same file twice b/c there is a duplicate hash

What should be happening: if you upload the same image twice, you should be able to optimize both of them.

  • Possible fixes: Look at last modified time, time created or file path

Steps to recreate:

  • Upload an image (cool-image.jpg) to the WordPress media library.
  • Scan for images with image optimizer
  • Optimize all found images
  • Upload the same image (cool-image.jpg) to the WordPress media library again.
  • Scan for images. The overview tab will report that there are 10 new images found to optimize.
  • Click "view images" and you won't see any images available for optimization.

Can't believe this didn't come up sooner!

This is something that I made a mental note of when first writing the addon but then never had the time to implement a fix. My two cents: I think using a file path is the best route here.

Off the top of my head, there are two ways a file path could be used to accomplish this:

  1. append the file path to the file buffer passing to the md5 hash function.
  2. store a hash map of processed image file paths and use that in conjunction with file md5 hashes to determine if an image is processed yet. (e.g. an image has been processed only if the image's md5 hash is found in imageData and the image path is in the hash map)

The second option (path hash map) seems more favorable to me as I think it will be more flexible, less prone to bugs and easier to maintain.

Edit:

This will be a bit tricky as the images are indexed in a hash map with their md5 hash digest as the key. This occurs during the image scanning process: https://github.com/getflywheel/local-addon-image-optimizer/blob/master/src/main/scanImagesProcess.ts#L12-L40

That said, the second approach would need to change a bit since I had forgotten exactly how images were indexed when originally writing that.