Customer Churn Prediction - Machine Learning

Problem Statement

Customer Churn prediction using machine learning. The objective is to test out various classical machine learning algorithms present in order to predict customer churn accurately. It also tries to exhaustively compare algorithms and the effects of data refining on similar algorithms.

Keywords - Customer Churn, Classification, Prediction, Logistic Regression, Support Vector Machine, Naive Bayes Classification

Introduction and Methodology

Customer churn, also referred to as subscriber churn or logo churn, refers to the proportion of subscribers who terminate their subscriptions and is commonly expressed as a percentage. Customer churn prediction and analysis is one of the foremost and widespread applications of classical machine learning. Customer churn is a critical metric that can display customer satisfaction at the macro scale. Additionally, the telecom sector generally sees more significant churn rates than other sectors. This creates a large-scale requirement for better prediction models.

For the purpose of training the model, the following was implemented in sequential order:

Data cleaning: On checking for duplicate and missing values, we found the data accurate and consistent.
Exploratory Data Analysis and Data Preprocessing: Conversion of categorical features to numerical features. Trend analysis of each feature with churn rate (y). Data unit conversion where required.
Correlation: Correlation matrix to find linear relationships between two variables.
Data preprocessing is done and encoding is done.
Generalized Linear Model: Relations between predictor variables and response variables devised based on the p-values.
Feature Scaling: Used to standardise the independent features within a fixed range.
Classification Models For the four models we have used, the approaches are as follows:

Binary Logistic Regression
Support Vector Machine (SVM)
Naive Bayes Classifier
Random Forest Classifier

SMOTE Analysis was done for data balancing.
Features selection on the basis of correaltion matrix and Principal Component Analysis
Confusion matrix and accuracy, precision, f1 score and recall were used for model analysis
Naive Bayes from scratch is observed
Logistic Regression is analysed by changing parameters and specifications.
SVM is anaylsed by changing its parameters and choosing the optimal one using GridSearch
ROC-AUC curve plot are made for analysis.

Results

Before and After Data Balancing:

Feature selection based on correlation and PCA:

The confusion matrix of models:

Analysis on Logistic Regression on the basis of parameters and specifications:

Analysis on SVM on the basis of different parameters and finding the optimal paramters using Grid Search:

ROC-AUC Curve:

Conclusion

Smote Analysis was quite effective in our case as we had imbalance in the churn data.
In the accuracy results of Naive Bayes -> increasing trend
In the accuracy results of Logistic Regression -> decreasing trend
After PCA, The final values of accuracy, f1 score, and precision had less impact.
For logistic regression, loss function + gradient descent works better.
For the SVC model, optimal parameters: linear kernel, C=1 & gamma=0.1 are used.
The AUC value of random forest was maximum (AUC=0.69).

References