erdogant/clusteval

Method and metric

Michael-E-Rose opened this issue · 4 comments

I sense potential in this package and I am inclined to use it in the future. Keep on the good work! Consider publishing the software in the SoftwareX journal.

I have a number of questions that I could not answer from the arguably short documentation:

  • It seems to me that I can either pick DBSCAN or say silhouette score, but not both at the same time. This seems odd to me because DBSCAN is a method whose results could be used with the silhouette score.
  • Related to that question: How are clusters evaluated if I pick DBSCAN or HDBSCAN, and how are clusters computed if I pick silhouette score or the Davies-Boulin index.
  • How could I choose different distance metrics to plug into e.g. DBSCAN or HDBSCAN?
  • How do I see which parameters got chosen?

Thanks for the answers! Questions 1 and 3 are about very similar things, so I take from it that my current application (DBSCAN on geo-coordinates, which require haversine distance) doesn't work with clusteval.

For the fourth question I wanted to know which values of hyperparameters got chosen in the end: With e.g. DBSCAN I assume multiple values for "eps" are tested, so I am keen learning which one yielded the best result.

I released a new update which has the separate input parameter for the clustering method and evaluation methods!
More information can be found here: https://github.com/erdogant/clusteval/releases/tag/2.0.0

Update to the latest release with:

pip install -U clusteval

The version should be >= 2.00

import clusteval
print(clusteval.__version__)