How can we get probabilities for all clusters in transform function?
suprateek-19 opened this issue · 3 comments
Currently we only get the predicted class through concept_model.transform()
Can we get the predicted probabilities for each cluster or the top n clusters?
That is currently not implemented. However, you can use the internal hdbscan model (concept_model.hdbscan_model
) to extract the probabilities using its approximate_predict
or hdbscan.membership_vector
functions.
I get following error while trying to access above which is a known issue too.
Any other way to get probability distribution across concepts for images?
AttributeError: 'HDBSCAN' object has no attribute 'approximate_predict'
@shilpiag123 You should use the it as follows:
import hdbscan
probabilities = hdbscan.membership_vector(cluster_model, embeddings)
Having said that, you will have to access the cluster model and also pre-calculate the embeddings. Instead, I would advise using BERTopic v0.15 instead which how now support for topic modeling with images very similar to Concept.