This repo aims to showcace an end-to-end pipeline of a Random Forest algorithm model to predict the behavior of a customer given a set of features such as age, profession, education, housing data, etc. The model is then deployed with FastAPI and served through a web application which provides easy and intuitive access.
To start the project, ensure your Docker desktop is running, and within this directory run:
docker-compose up
- Dataset acquisition
- Read and pre-process data
- Feature engineering/encoding
- Model training (Random Forest Algorithm)
- Dockerised model deployment using FastAPI
- Web App frontend using Node (To-do)
"Bank Marketing" (Moro,S., Rita,P., and Cortez,P.. (2012). Bank Marketing. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.)
Datset Info:
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.
- Finding optimal number of bins for feature engineering:
- Sturges' Rule: Fit for data with small range i.e. 200 samples or less
- Freedman-Diaconis Rule: Fit for greater data range
- Aurelien Geron. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd. ed.). O'Reilly Media, Inc.
Author: Muhammad Naufal Al Ghifari