/clusteval

Clusteval provides methods for unsupervised cluster validation

Primary LanguagePythonOtherNOASSERTION

clusteval

Python PyPI Version License Coffee Github Forks GitHub Open Issues Project Status Downloads Downloads

  • clusteval is Python package for unsupervised cluster evaluation. Three methods are implemented that can be used to evalute clusterings; silhouette, dbindex, and derivative Four clustering methods can be used: agglomerative, kmeans, dbscan and hdbscan.

Contents

Installation

  • Install clusteval from PyPI (recommended). clusteval is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.

  • It is distributed under the MIT license.

  • A new environment can be created as following:

conda create -n env_clusteval python=3.6
conda activate env_clusteval
pip install clusteval
  • Beta version can be installed from the GitHub source:
git clone https://github.com/erdogant/clusteval
cd clusteval
pip install -U .

Import clusteval package

from clusteval import clusteval

Create example data set

# Generate random data
from sklearn.datasets import make_blobs
X, labx_true = make_blobs(n_samples=750, centers=4, n_features=2, cluster_std=0.5)

Cluster validation using Silhouette score

# Determine the optimal number of clusters

ce = clusteval(method='silhouette')
ce.fit(X)
ce.plot()
ce.dendrogram()
ce.scatter(X)

Cluster validation using davies-boulin index

# Determine the optimal number of clusters
ce = clusteval(method='dbindex')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using derivative method

# Determine the optimal number of clusters
ce = clusteval(method='derivative')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using dbscan

# Determine the optimal number of clusters using dbscan and silhoutte
ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using hdbscan

To run hdbscan, it requires an installation. This library is not included in the clusteval setup file because it frequently gives installation issues.

pip install hdbscan
# Determine the optimal number of clusters
ce = clusteval(cluster='hdbscan')
ce.plot()
ce.scatter(X)

Citation

Please cite clusteval in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019clusteval,
  title={clusteval},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/clusteval}},
}

TODO

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • If you wish to buy me a Coffee for this work, it is very appreciated :) Star it if you like it!