MaartenGr/Concept

How can we get probabilities for all clusters in transform function?

suprateek-19 opened this issue · 3 comments

Currently we only get the predicted class through concept_model.transform()
Can we get the predicted probabilities for each cluster or the top n clusters?

That is currently not implemented. However, you can use the internal hdbscan model (concept_model.hdbscan_model) to extract the probabilities using its approximate_predict or hdbscan.membership_vector functions.

I get following error while trying to access above which is a known issue too.
Any other way to get probability distribution across concepts for images?
AttributeError: 'HDBSCAN' object has no attribute 'approximate_predict'

@shilpiag123 You should use the it as follows:

import hdbscan
probabilities = hdbscan.membership_vector(cluster_model, embeddings)

Having said that, you will have to access the cluster model and also pre-calculate the embeddings. Instead, I would advise using BERTopic v0.15 instead which how now support for topic modeling with images very similar to Concept.