The dataset consists of dummy employee details of a company. It includes employees' education details, joining year, city of office, payment tier, age, gender, if the employee is ever kept out of projects for 1 month or more and experience in current field. The task is to predict whether the employee will leave the company in the next 2 years.
For the project, first I have done EDA and feature selection. Then, I have trained three models: LogisticRegression, RandomForestClassifier and XGBClassifier.
I have created a Pipeline for each of these models, such that the data transformation and model training/predictions steps are assembled together.
- Python
- Scikit-learn
- Machine Learning Pipeline
- Docker
- Streamlit
- Clone the project repo and open it.
-
Create a virtual environment for the project using
pipenv shell
-
Install required packages using
pipenv install
-
Build the docker image using
sudo docker build -t employee_future .
-
Run the docker container using
sudo docker run -p 5000:5000 employee_future
-
Open the URL http://localhost:5000/ to run and test the app.
- Open the Deploy an app page of Streamlit.
- Enter the GitHub repository details in which the streamlit_app.py file and model binaries are stored.
- Click on Deploy button.
- Open the URL https://share.streamlit.io/aniketsharma00411/employee_future_prediction/main to run and test the app.
Model | Validation Set Accuracy | Training+Validation Set Accuracy |
---|---|---|
LogisticRegression | 81.20 % | 80.12 % |
RandomForestClassifier | 85.28 % | 89.04 % |
XGBClassifier | 85.93 % | 87.37 % |
Selected Model (RandomForestClassifier) Test Set Accuracy = 86.57 %