/bank_customers_churn_prediction_exploring_7_different_classification_algorithms

This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not, by applying the below steps of a Data Science Project Life-Cycle 1. Data Exploration, Analysis and Visualisations 2. Data Pre-processing 3. Data Preparation for the Modelling 4. Model Training 5. Model Validation 6. Optimized Model Selection based on Various Performance Metrics 7. Deploying the Best Optimized Model into Unseen Test Data 8. Evaluating the Optimized Model’s Performance Metrics The business case of determining the churn status of bank customers are explored, trained and validated on 7 different classification algorithms/models as listed below and the best optimized model is selected based on the accuracy metrics. 1. Decision Tree Classifier - CART (Classification and Regression Tree) Algorithm 2. Decision Tree Classifier - IDE (Iterative Dichotomiser) Algorithm 3. Ensemble Random Forest Classifier Algorithm 4. Ensemble Adaptive Boosting Classifier Algorithm 5. Ensemble Hist Gradient Boosting Classifier Algorithm 6. Ensemble Extreme Gradient Boosting (XGBoost) Classifier Algorithm 7. Support Vector Machine (SVM) Classifier Algorithm

Primary LanguageJupyter Notebook

bank_customers_churn_prediction_exploring_7_different_classification_algorithms

This project deals with the classification of the bank customers on whether a customer will leave the bank (i.e.; churn) or not.

Project Life-Cycle:

As part of this project; below steps of a Data Science Project Life-Cycle is being implemented.

1. Data Exploration, Analysis and Visualisations

2. Data Pre-processing

3. Data Preparation for the Modelling

4. Model Training

5. Model Validation

6. Optimized Model Selection based on Various Performance Metrics

7. Deploying the Best Optimized Model into Unseen Test Data

8. Evaluating the Optimized Model’s Performance Metrics

The business case and the problem statement to determine the churn status of the bank customers are explored, trained and validated on 7 different classification algorithms/models as listed below and the best finalized optimized model is selected based on the various performance metrics namely accuracy, precision, recall and f1-score.

1. Decision Tree Classifier - CART (Classification and Regression Tree) Algorithm

2. Decision Tree Classifier - IDE (Iterative Dichotomiser) Algorithm

3. Ensemble Random Forest Classifier Algorithm

4. Ensemble Adaptive Boosting Classifier Algorithm

5. Ensemble Hist Gradient Boosting Classifier Algorithm

6. Ensemble Extreme Gradient Boosting (XGBoost) Classifier Algorithm

7. Support Vector Machine (SVM) Classifier Algorithm

Performance Metrics and Results of all the Optimized Classifier Models:

Accuracy Metrics of all the Optimized Classifier Models:

Selection and Decision on Final Optimized Classifier Model for Deployment:

As we can see from the above results; models named "Ensemble Random Forest Classifier Model" and "Ensemble Extreme Gradient Boosting (XGBoost) Classifier Model" have performed comparatively better during the validation stage.

However; by considering the "Churn" Class Precision Score which is one of the key performance metric in this business case, it is evident that the "Ensemble Extreme Gradient Boosting (XGBoost) Classifier Model" performed significantly much better when compared with that of the "Ensemble Random Forest Classifier Model".

Hence we can decide and consider "Ensemble Extreme Gradient Boosting (XGBoost) Classifier Model" as the final model to be deployed into the unseen test data.


Performance Results on the Unseen Test Data:



As we can see from the above test performance results; Extreme Gradient Boosting (XGBoost) Model has performed with an overall accuracy of about 90% on the unseen test data.