Duplicate Identifier or Similarity Identifier

Question

Duplicate Identifier or Similarity Identifier

alexminnaar opened this issue 8 years ago · 2 comments

I was hoping to get some clarification on the intended use-case for this. Should it strictly be used for duplicate detection or can it also be used to identify similar images. This page seems to suggest that it can be used to measure image similarity. However when I try it on the attached images, it does not seem to agree with the intuition that two images of shoes should be significantly more similar than an image of a shoe and something else.

The distance between the first and second image seems to be 0.71422605625006175 but the distance between the first and third is 0.70043762770711271.

Answer 1 · 2017-02-24T09:04:37.000Z

Hi @alexminnaar yes, the intended use is for near-duplicate images. The original use case was detection of copyright violation over a corpus of a billion+ images.

Here's a video from pydata explaining more.

Sorry about the confusion, I'll link this issue from the README to help others.

Answer 2 · 2017-03-23T16:26:35.000Z

@alexminnaar if you are interested You can use something like this https://github.com/akshayubhat/DeepVideoAnalytics