MBA Placement Prediction

Description

The dataset consists of placement data of students in an MBA university. It includes secondary and higher secondary school percentages and specialization. It also includes degree information, work experience, and salary offered to the placed students.

The goal of this project is to create an ML model to predict which students will get placed and the salary of the placed students.

The files starting with ‘status*’ are for the classification problem where the model predicts whether a student will get placed or not. The files starting with ‘salary*’ are for the regression problem where the model predicts the salary offered to the placed students.

For the status prediction problem, I have trained three models: LogisticRegression, RandomForestClassifier and XGBClassifier.

For the salary prediction problem, I have trained four models: LinearRegression, ElasticNet, RandomForestRegressor and XGBRegressor.

I have created Pipeline for each of these models, such that the data transformation and model training/predictions steps are assembled together.

For both the problems, I have done separate EDA and feature selection.

Tech Stack and concepts used

  • Python
  • Scikit-learn
  • Machine Learning Pipeline
  • Docker
  • Streamlit

Setup

  • Clone the project repo and open it.

Virtual Environment

  • Create a virtual environment for the project using

    pipenv shell
  • Install required packages using

    pipenv install

Docker Container

  • Build the docker image using

    sudo docker build -t mba_placement .
  • Run the docker container using

    sudo docker run -p 5000:5000 mba_placement
  • Open the URL http://localhost:5000/ to run and test the app.

Deploying to Cloud

Status Prediction Results

Model Validation Set Accuracy Training+Validation Set Accuracy
LogisticRegression 95.35 % 89.53 %
RandomForestClassifier 93.02 % 96.51 %
XGBClassifier 97.67 % 98.84 %

Selected Model (XGBClassifier) Test Set Accuracy = 83.72 %

Salary Prediction Results

Model Validation Set RMSE Training+Validation Set RMSE
LinearRegression 72827.10 84563.22
ElasticNet 58410.36 89727.80
RandomForestRegressor 58509.22 51349.49
XGBRegressor 60382.35 53142.14

Selected Model (RandomForestRegressor) Test Set RMSE = 92649.18