KilianB/JImageHash

GIFs getting the same hash when the first frame is identical

anatolyra opened this issue · 4 comments

Hi,

When generating hash values for two GIF files. I'm getting the same hash value for both if the first frame in both is identical.
two_dogs_1
two_dogs_2

Is that the expected behavior?

Thanks!

it currently is, but maybe we can alter it to something you seem appropriate.

What behavior would you like?

Create a single hash for the entire gif?

  • Concatenate different hashes for each frame? (Same gif will match but different order won't, this will prevent comparison of gifs with different number of frames.
  • Creating a hash object for each individual image and group it in some kind of way to be able to compare all images and a single image within the gifs? (Search if individual images match within a collection?)
  • ....

My suggestion is to create a "gif hash collection" allowing for different similarity distances.

intersect find image matches contained in both gif collections
distinct 1 - intersect
totalDistance summed distance frame by frame
minDistance summed distance for each frame to the closest frame
distanceShifted total distance but shifted to create the lowest value

I like what you suggest. A couple of things:

  1. intersect - to find image matches in both collections, you'll have to allow for giving a minimum similarity value
  2. What do you mean by distinct?
  3. Maybe give a result of average distance and variance?

Thanks!

I define distinct as the inverse operation of intersection. Return all images which are unique to one collection.

The issue tracker serves as notes and comments, therefore don't worry if it gets a bit messy. I am just writing down random thoughts.

Coding all of this is trivial and can be done within a short time, the issue arrises from a design point perspective:

  • For my liking I would create a new class similar to FuzzyHash which groups hashes together. The hash object can be returned from the hashing algorithm easily if it extends Hash. If this is done maybe it's time to implement a new abstract super class hash collection.
  • Searching for images is more a feature of an `ImageMatcher' rather than a hash object. Semantically creating a hash object bothers me a tiny bit (We could query if an image is contained in the gif).
  • The base functionalities of the default hash object is still valid, therefore inheritance is the way to go but at the same time it's also a composition.

Note: This link explains how frames can be extracted from gif images: https://stackoverflow.com/questions/8933893/convert-each-animated-gif-frame-to-a-separate-bufferedimage . This method requires a file as an input, we should also support a utility loader for gif images to not require the user to perform the same FileIO multiple times if he want's to hash the same gif with multiple algorithms. Are there any gif containers available or should we create our own bufferedImagecollection?

Do we want to overload the hash method of hashing algorithms checking if the supplied image is a gif and create the appropriate GifHashCollection, or create an entirely new method hashGif?