- Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity
The goal of this project is to turn a POC(Proof Of Concept) project into a Production Ready Project by leveraging all good practices from Udacity MLOPs Chapter 1 such as:
- Code refactoring
- Comments, DocString, etc
- Modularity
- Testing
- Logging
- AutoPEP8 and Pylint
The base project were provided by Udacity and it predicts, for bank customers, the customer churn.
The project follows the below workflow:
- EDA: Explore and Analyze the dataset
- FEATURE-ENGINEERING: Adapt the dataset for training
- TRAINING: Train ML Model. In our case, we train two classification models from sklearn library (A Random Forest and a Logistic Regression)
- POST-TRAINING: To understand what influence our models, we using SHAP library to post analyze the features. It allows to identify the features with higher impact on predictions.
- STORAGE: Save best models and associated metrics for tracking and reusage for other inferences.
The project is organized as follows:
- PROJECT_FOLDER
- data
- bank_data.csv --> csv dataset
- images --> contains model scores, confusion matrix, ROC curve
- eda --> results of the EDA
- results --> Training and evaluation results
- logs --> log generated during testing
- models --> contains saved models in .pkl format
- churn_library.py --> Main entry file containing all the functions
- churn_notebook.ipynb --> jupyter notebook leveraging churn_library.py for step-by-step execution
- churn_script_logging_and_testing.py --> testing script for churn_library.py
- constant.py --> ontains constant informations such as columns to process
- requirements_py3.X.txt --> requirements for the execution
- data
All python files were designed following pep8 rules and ratied as follows by pylint:
- churn_library.py = 8.28
- constant.py = 10
- churn_script_logging_and_testing.py = 8.32
How do you run your files? What should happen when you run your files?
The following project were tested using python 3.8 and all listed packages inside the requirements_py3.8.txt.
To install required packages, run the following command:
pip install -r requirements_py3.8.txt
The project can be run from two entry points:
- churn_library.py using
python churn_library.py
or
- churn_notebook.ipynb
For this project, each function can be tested using dedicated module. To run the tests:
python churn_script_logging_and_tests.py
It will print function under tests and for more detailed logging please check logs/churn_library.log file.