Predict Customer Churn

Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity

Project Description

The goal of this project is to turn a POC(Proof Of Concept) project into a Production Ready Project by leveraging all good practices from Udacity MLOPs Chapter 1 such as:

Code refactoring
Comments, DocString, etc
Modularity
Testing
Logging
AutoPEP8 and Pylint

The base project were provided by Udacity and it predicts, for bank customers, the customer churn.

The project follows the below workflow:

EDA: Explore and Analyze the dataset
FEATURE-ENGINEERING: Adapt the dataset for training
TRAINING: Train ML Model. In our case, we train two classification models from sklearn library (A Random Forest and a Logistic Regression)
POST-TRAINING: To understand what influence our models, we using SHAP library to post analyze the features. It allows to identify the features with higher impact on predictions.
STORAGE: Save best models and associated metrics for tracking and reusage for other inferences.

Files and data description

The project is organized as follows:

PROJECT_FOLDER
- data
  - bank_data.csv --> csv dataset
- images --> contains model scores, confusion matrix, ROC curve
  - eda --> results of the EDA
  - results --> Training and evaluation results
- logs --> log generated during testing
- models --> contains saved models in .pkl format
- churn_library.py --> Main entry file containing all the functions
- churn_notebook.ipynb --> jupyter notebook leveraging churn_library.py for step-by-step execution
- churn_script_logging_and_testing.py --> testing script for churn_library.py
- constant.py --> ontains constant informations such as columns to process
- requirements_py3.X.txt --> requirements for the execution

All python files were designed following pep8 rules and ratied as follows by pylint:

churn_library.py = 8.28
constant.py = 10
churn_script_logging_and_testing.py = 8.32

Running Files

How do you run your files? What should happen when you run your files?

The following project were tested using python 3.8 and all listed packages inside the requirements_py3.8.txt.

To install required packages, run the following command:

pip install -r requirements_py3.8.txt

How to run the project

The project can be run from two entry points:

churn_library.py using

python churn_library.py

churn_notebook.ipynb

How to test the the functionnality

For this project, each function can be tested using dedicated module. To run the tests:

python churn_script_logging_and_tests.py