Project aim: Creating a churn prediction model (binary classification) and deploying it using docker container on local Linux Machine and AWS Elastic Beanstalk using awsebcli.
Data Source
: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Project Requirements
: python 3.9, pandas, numpy, scikit-learn and docker
Operating System
: Linux
Tools
: Visual Studio Code and Linux command line
The project involved the following steps:
- Downloading, loading, cleaning and preparing data for analysis
- Performing exploratory data analysis
- Performing a feature importance analysis using the concept of
Risk Ratio
- Performing
correlation
analysis for feature selection
- Performing
feature engineering
which involvesOne-Hot Encoding
for categorical variables
- Fitting and training a
logistic regession
binary classification model usingscikit-learn's LogisticRegression
class
Model evaluation
using basic evaluation matric such asaccuracy
- Identifying
cut-off
for achievingmaximum accuracy
- Computing
auc-roc
(due to target class imbalance) - Used
pickle
library to save and load the model
- Created a basic flask app
churn-serving.py
for predicting churn based on single observation input - Tested flask app's prediction using
request
python library - Used
pipenv
to manage project dependencies. This will createPipenv
andPipenv.lock
files. It manages exact depndencies that we need to run the application. - Installed
gunicorn
using pipenv which is a production readyWSGI server tool
. It can handle http request without any issues.
- Installed
docker
for creating and running a docker container - Created a
Dockerfile
and added instructions - Built the
churn-prediction
docker container
Run
thedocker container
and tests to check for model prediction
- Locally tested and deployed on AWS using Elastic Beanstalk CLI (awsebcli)
- Tested the prediction using the AWS Application host's IP address