Validation questions
mdagost opened this issue · 1 comments
mdagost commented
I'm using both relative_validity_
and the full validity_index
function from hdbscan.validity
. @lmcinnes if they give different optimal parameters, is there a reason to prefer one over the other? Perhaps validity_index
because the other one is approximate?
My application is in NLP clustering of embedding vectors, and one of the things I'm testing are different embedding vectors with different dimensionalities. Is it valid to use either of those metrics to compare across embeddings for the same dataset, or only across the hdbscan parameters themselves?
Thank you so much!