/easy-clustering

Determine how many clusters should exist in your dataset via hierarchical clustering and plot the clusters on a UMAP

Primary LanguagePythonCreative Commons Zero v1.0 UniversalCC0-1.0

easy-clustering

Have you ever looked at your data and thought "how many clusters should I even input into K-means??". I got tired of looking for an elbow in the elbow plot, so I created these functions that perform agglomerative clustering on X, automatically decide a distance cutoff for defining clusters, and plot the clusters on both a dendrogram and a UMAP so you can inspect the quality of them.

Additionally, the plot_umap function accepts up to 4 y variables, which it will plot on up to 4 subplots, so you can visualize important features of your data and how they coincide with the clusters. The colorbars automatically change for better visualization of binary vs continuous features.