GIFs getting the same hash when the first frame is identical
anatolyra opened this issue · 4 comments
it currently is, but maybe we can alter it to something you seem appropriate.
What behavior would you like?
Create a single hash for the entire gif?
- Concatenate different hashes for each frame? (Same gif will match but different order won't, this will prevent comparison of gifs with different number of frames.
- Creating a hash object for each individual image and group it in some kind of way to be able to compare all images and a single image within the gifs? (Search if individual images match within a collection?)
- ....
My suggestion is to create a "gif hash collection" allowing for different similarity distances.
intersect
find image matches contained in both gif collections
distinct
1 - intersect
totalDistance
summed distance frame by frame
minDistance
summed distance for each frame to the closest frame
distanceShifted
total distance but shifted to create the lowest value
I like what you suggest. A couple of things:
- intersect - to find image matches in both collections, you'll have to allow for giving a minimum similarity value
- What do you mean by distinct?
- Maybe give a result of average distance and variance?
Thanks!
I define distinct as the inverse operation of intersection. Return all images which are unique to one collection.
The issue tracker serves as notes and comments, therefore don't worry if it gets a bit messy. I am just writing down random thoughts.
Coding all of this is trivial and can be done within a short time, the issue arrises from a design point perspective:
- For my liking I would create a new class similar to FuzzyHash which groups hashes together. The hash object can be returned from the hashing algorithm easily if it extends Hash. If this is done maybe it's time to implement a new abstract super class
hash collection
. - Searching for images is more a feature of an `ImageMatcher' rather than a hash object. Semantically creating a hash object bothers me a tiny bit (We could query if an image is contained in the gif).
- The base functionalities of the default hash object is still valid, therefore inheritance is the way to go but at the same time it's also a composition.
Note: This link explains how frames can be extracted from gif images: https://stackoverflow.com/questions/8933893/convert-each-animated-gif-frame-to-a-separate-bufferedimage . This method requires a file as an input, we should also support a utility loader for gif images to not require the user to perform the same FileIO multiple times if he want's to hash the same gif with multiple algorithms. Are there any gif containers available or should we create our own bufferedImagecollection?
Do we want to overload the hash method of hashing algorithms checking if the supplied image is a gif and create the appropriate GifHashCollection
, or create an entirely new method hashGif
?