discussion on different concepts results

Question

discussion on different concepts results

bakachan19 opened this issue 2 years ago · 2 comments

Hi.

Thank you for this library. It is really helpful.
I am using concept modeling to cluster images and do some analysis on the results.
I modified the use of find_concepts() ( that initially was meant to find the top 5 related concepts based on a search term) to find the top 5 related concepts given an image (by simply passing the path to an image and obtain the embeddings of the image with the embedding model).
However I noticed that in many cases the top-1 most related cluster is different from the cluster that is returned by fit_transform(). Sometimes the concept is in second position, but in many cases it is in positions >2. Any idea on why this might be happening?

Thank you for your time.
Best wishes.

Answer 1 · 2023-05-07T12:58:38.000Z

The find_concepts function is merely a quick search function and does not behave the same way .transform does. .find_concepts applies a cosine similarity between image and concept embeddings to quickly find a match. However, this is not an exact representation of the training process during .fit which involves clustering and dimensionality reduction.

Answer 2 · 2023-05-08T07:10:23.000Z

Ohh, I see.
Thank you!