rhsimplex/image-match

Duplicate Identifier or Similarity Identifier

alexminnaar opened this issue · 2 comments

I was hoping to get some clarification on the intended use-case for this. Should it strictly be used for duplicate detection or can it also be used to identify similar images. This page seems to suggest that it can be used to measure image similarity. However when I try it on the attached images, it does not seem to agree with the intuition that two images of shoes should be significantly more similar than an image of a shoe and something else.

68510a677540a15fdeeafad1ff381e250653e27f
88544b108b4e80844e2e43d48d21db8f99506dc9
fefd7feddf373423c20ea759c0a290003325372a

The distance between the first and second image seems to be 0.71422605625006175 but the distance between the first and third is 0.70043762770711271.

Hi @alexminnaar yes, the intended use is for near-duplicate images. The original use case was detection of copyright violation over a corpus of a billion+ images.

Here's a video from pydata explaining more.

Sorry about the confusion, I'll link this issue from the README to help others.

@alexminnaar if you are interested You can use something like this https://github.com/akshayubhat/DeepVideoAnalytics