This project involves analysing a home loans dataset to identify high-default clusters among customers. We will begin by performing EDA on the dataset, before performing clustering to identify customer segments, and concluding our findings.
- Obtained dataset from kaggle.
- Performed EDA & imputed missing values using similar entries.
- Used k-means clustering to identify clusters with different default risks.
- Identified optimal k value using elbow method & silhouette score.
- Obtained a cluster with 80% good loans & explored its characteristics.
Quick Links:
- Read project online *Recommended for viewing
Alternatively, the following files are also available to view/ download in the repo.
Some snapshots from the project can be found below:
As a conclusion, if the firm would like to launch a home loans campaign, they should focus on advertising it to married people living in either the semiurban and urban areas as these characteristics were found to be among the groups resulting in more good loans.