This is the repo for a blog post in Medium.
The blog post proposes a supervised clustering method that partitions data points into a limited number of clusters with respect to a target variable, based on the features specified by the user.
The resulting clusters have the following characteristics:
- The target variable has low variance within a cluster, but has high variance between clusters, and
- The data points in a cluster share similar values in the features that are relevant to distinguish the target.
The method is robust to the presence of irrelevant features and correlated features. This supervised clustering method also helps to increase the interpretability of machine learning models.
- /datasets/churn_simulated: The sample dataset used in the blog post.
- /notebooks: The notebook used in the blog post.
- /blog_medium: The draft to the blog post in latex format.
- /codes: Python library to implement the supervised clustering method (work in progress)