Employee Future Prediction

Description

The dataset consists of dummy employee details of a company. It includes employees' education details, joining year, city of office, payment tier, age, gender, if the employee is ever kept out of projects for 1 month or more and experience in current field. The task is to predict whether the employee will leave the company in the next 2 years.

For the project, first I have done EDA and feature selection. Then, I have trained three models: LogisticRegression, RandomForestClassifier and XGBClassifier.

I have created a Pipeline for each of these models, such that the data transformation and model training/predictions steps are assembled together.

Tech Stack and concepts used

  • Python
  • Scikit-learn
  • Machine Learning Pipeline
  • Docker
  • Streamlit

Setup

  • Clone the project repo and open it.

Virtual Environment

  • Create a virtual environment for the project using

    pipenv shell
  • Install required packages using

    pipenv install

Docker Container

  • Build the docker image using

    sudo docker build -t employee_future .
  • Run the docker container using

    sudo docker run -p 5000:5000 employee_future
  • Open the URL http://localhost:5000/ to run and test the app.

Deploying to Cloud

Prediction Results

Model Validation Set Accuracy Training+Validation Set Accuracy
LogisticRegression 81.20 % 80.12 %
RandomForestClassifier 85.28 % 89.04 %
XGBClassifier 85.93 % 87.37 %

Selected Model (RandomForestClassifier) Test Set Accuracy = 86.57 %