- Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity
Customer Churn is a condition in which consumers do not continue or leave the services offered by an industry. If not handled, it will lead to loss of revenue and even loss. To overcome this, we need to know which consumers have the potential to churn, so that we can focus on grabbing those consumers and know the constraints in the services the industry provides.
This project seeks to answer the customer churn that is happening in the banking industry. The dataset used in this project is obtained from Kaggle.
Workflows used to answer these problems include:
- import data
- Exploratory Data Analysis : Explore data features, such as the distribution of values in data and analyze correlations between features in data
- Feature Engineering : These stages consist of: one-hot-encoding categorical features, standardization (homogenizing the range of numerical values), and data splitting
- Modeling : This stage consists of training two models (logistic regression and random forest), hyper-parameter tuning, model evaluation (ROC-Curve and confussion matrices)
The structure of this project directory tree is displayed as follows:
.
├── churn_library.py
├── churn_notebook.ipynb
├── churn_script_logging_and_tests.py
├── data
│ └── bank_data.csv
├── Guide.ipynb
├── images
│ ├── eda
│ │ ├── Churn.png
│ │ ├── Customer_Age.png
│ │ ├── Gender.png
│ │ ├── heatmap.png
│ │ ├── Marital_Status.png
│ │ └── Total_Trans_Amt.png
│ └── results
│ ├── feature_importances.png
│ ├── logistic_results.png
│ ├── rf_results.png
│ └── roc_curve_result.png
├── logs
│ └── churn_library.log
├── models
│ ├── logistic_model.pkl
│ └── rfc_model.pkl
├── __pycache__
│ ├── churn_library2.cpython-36.pyc
│ └── churn_library.cpython-36.pyc
├── README.md
└── requirements.txt
there are 4 main folders in the repository:
data
:location of data saved incsv
formatimages
: in this folder there are two sub-folders, namely:eda
andresults
. Theeda
sub-folder is used to store the results of visualizations of numerical, categorical, and heatmap data distributions of correlation values between variables. Theresults
sub-folder stores visualization and evaluation of models, such as: ROC-curve, feature importance, confussion matrix cards.logs
: This folder stores the logs of function test results on the churn_library.py filemodels
: This folder is used to store model objects with the ``.pkl` extension.
Other important files in this repository include:
churn_library.py
: This file is used to write down modeling functions, such as: reading data, eda, feature engineering (one-hot-encoding, standardization, and data splitting), as well as modeling with logistic regression and random forest (training, prediction, and evaluation)churn_script_logging_and_tests.py
: This file is used to test and logging workflow modeling written in the filechurn_library.py
.requirements.txt
: This file contains the libraries used in modeling and versions of those libraries.churn_notebook.ipynb
: A file containing a modeling workflow prototype before it is converted into a function in a filechurn_library.py
to clone this project, make sure you have git installed in your computer. If you have already installed git, run this command
git clone https://github.com/mohrosidi/udacity_customer_churn.git
Here is a list of libraries used in this repository:
autopep8==1.5.7
joblib==0.11
matplotlib==2.1.0
numpy==1.12.1
pandas==0.23.3
pylint==2.9.6
scikit-learn==0.22
seaborn==0.8.1
To be able to run this project, you must install python library using the following command:
pip install -r requirements.txt
To run the workflow, simply run the churn_library.py
in your terminal using command bellow:
ipython churn_library.py
In other conditions, suppose you want to change the configuration of the modeling workflow, such as: changing the path of the data location, adding other models, adding feature engineering stages. You can change it in churn_library.py
files. To test if your changes are going well, you need to do testing and logging.
To do testing and logging, you need to change a number of configurations in the churn_script_logging_and_tests.py
file, such as: target column name, categorical column name list, data location, etc. After that, run the following command in the terminal to perform testing and loggingAfter that, run the following command in the terminal to perform testing and logging:
ipython churn_script_logging_and_tests.py
Make sure the code you create complies with PEP 8
rules. To check it automatically, run pylint on the terminal.
pylint churn_library.py
pylint churn_script_logging_and_tests.py
Pylint
will provide recommendations for improvements in your code. A good code is a code that has a score close to 10.
To make repairs automatically, you can use autopep8.
autopep8 --in-place --aggressive --aggressive churn_script_logging_and_tests.py
autopep8 --in-place --aggressive --aggressive churn_library.py