/Staff_Attrition_Prediction

ML Zoomcamp Capstone Project 1

Primary LanguageJupyter Notebook

Staff_Attrition_Prediction

DatatalksClub ML Zoomcamp Capstone Project 1

Objectives of Capstone Project:

  • Find a dataset
  • Explore and prepare the data
  • Train the best model
  • Export the notebook into a script
  • Put model into a web service
  • Deploy model locally with Docker
  • Deploy to the cloud

Find a Dataset

Overview and Objective of Project

In the dynamic world of business, understanding and predicting employee attribution is crucial. This project harness the power of data to forecast and analyze employee attrition trends. We will have the opportunity to dive deep into HR analytics and explore the factors that influence employee turnover.

The data for this project is from Dataset Link

The aim of the project is to develop a predictive model that provide insights into employee retention, identify patterns, and contribute to strategic talent management.

Explore and prepare the data

Exploration of the data was done via ML_Zoomcamp_Attrition_Pred_Proj.ipynb jupyter notebook file

  • Explore data
    • Checked the Data Structure and columns
    • Checked the numbers of features and observations in the data
    • Checked the inconsistency in column names and corrected
    • Check the correlation of the features to the target variable (attrition)
  • prepare data
    • Checked for missing values
    • Checked for outliers
    • Checked for Duplicates
  • train data
    • Catergorical variables were encoded using the DictVectorizer library.
    • trained best model using the Gradient Boosting Classifier model after ascertaining it to produce the best model with hyper parameters via Train_model.py script.

Environment Setup

  • Setup Pipenv Virtual Environment, by opening Cli on your system and execute the below:
  • NB: All Command Line interface (Cli) commands need to be excecuted in the folder where the Virtual Environment was created and setup - Virtual Environment for this project was setup in folder "c:\midterm".
pip install pipenv
  • install the following:

    • Gunicorn
    • flask
    • numpy
    • scikit-learn version "1.3.2"
    • requests
  • Execute the following in the cli:

    Step 1

    pipenv shell
    

    Step 2

    pipenv install gunicorn flask numpy scikit-learn=="1.3.2" requests
    
  • To get copies of the project and dependencies, clone the repo.

  • The copied files should be placed in the virtual environment folder after being cloned via below command.

    git clone https://github.com/kabiromohd/Staff_Attrition_Prediction.git
    

Model deployment to web services

  • Flask was used for web deployment via predict.py script. To test the flask web deployment execute the following:

    Step 1

    pipenv shell
    

    Step 2

    python predict.py
    
  • Below screenshot illustrates output:

run predict

  • To test the flask web services deployment a data point has been created in predict_test.py file.

  • Open another fresh Cli and run the following:

    Step 1

    pipenv shell
    

    Step 2

    python predict_test.py
    
  • Below screenshot illustrates the output:

run predict test

Deploy model locally with Docker

  • Deploy Flask web services to Docker locally by following these steps.

    • Install Docker desktop and need to be running before building image.
    • Create an account on docker web login Docker Web.
    • Creating an account on Docker enables setting up of Docker repository which can be used to push the docker image created locally.
    • Docker repo created for the purpose of this project is "kabiromohd/data_science".
    • Docker repository was created to enable getting URL for the capstone1 image.
  • To create the project docker image:

    • Create a docker repo, which in my case "kabiromohd/data_science".
    • You can create the Docker repo from the web sign in interface of docker Docker Web
    pipenv shell
    
    • Create docker image by running the following:
    docker build -t kabiromohd/data_science:capstone1 .
    
    • followed by this docker command which runs the docker image created
    docker run -it --rm -p 6090:6090 kabiromohd/data_science:capstone1
    
    • If the two commands run successfully below screenshot will be seen:

Docker Deployment

  • To test the local Docker deployment:

    • Open another created virtual environment Cli and run below command to see prediction.
    • The output should be the same as a the flask web services deployment.
    pipenv shell
    
    • Note: predict_test.py has already prepared with data point to test the model deployed locally on docker

    Run below command.

    python predict_test.py
    

This ends the local deployment to docker.

Deploy docker image to the cloud

  • For cloud deployment Render was used.
  • User account needs to be created on Render
  • To deploy the docker image to cloud, open a Cli and run the following commands:
pipenv shell
  • Push the docker image created above to the repo created with the following command:
docker push kabiromohd/data_science:capstone1
  • The image pushed to docker web will appear as in below screenshot

Docker Web

  • Copy the docker image URL to render from the docker repo

Render Interface

  • Deploy docker image to Render cloud service. See below screenshot for the output of deployment

Render Deployment

Deployment Link

My Capstone 1 Deployment Link from Render

To interact with the docker image deployed to Render cloud Services

  • copy the render deployment link and place in the predict_render.py script as "host".

  • predict_render.py also has already prepared data point to be used to test the model deployed to cloud.

  • for this project, the deployment link has already been provided in the predict_render.py script. It can be executed as illustrated below:

  • Open created Virtual Environment Cli and run the following:

    pipenv shell
    

    followed by:

    python predict_render.py
    

Video illustration of Cloud Deployment test

See illustration video below:

Render.Cloud.Capstone1.mp4