/HR-Analytics-EDA-Model-Building-Tuning

in this notebook, I want to analyze the Human Resource dataset. I will build different models and compare them with their gained accuracy.

Primary LanguageJupyter Notebook

bigstock-Recruitment-Concept-Idea-Of-C-250362193

HR-Analytics-EDA-Model-Building-Tuning

In this notebook, I want to analyze the Human Resource dataset. I will build different models and compare them with their gained accuracy.

Data Understanding & Problem

The datasets contains the following categories such as: satisfaction_level

  • 1 last_evaluation
  • 2 number_project
  • 3 average_montly_hours
  • 4 time_spend_company
  • 5 Work_accident
  • 6 left
  • 7 promotion_last_5years
  • 8 department
  • 9 salary

Problem:

We want to predict if the employee left the company or not. 1 means the employee will left the company

This Notebook Contains:

  • Data Underestanding
  • Data Exploration
  • Data Preparation
  • Data Visualization
  • Feature Engineering
  • Build Machine Learning Models
  • Machine Learning Models With Cross Validation
  • Model Evaluation
  • Hyperprameters Tuning

Data Visualization

Salary

The relationship between the amount of salary and left the company:

Salary   Left

The correlation matrix between the different features and the target variable

Accuracy with Cross Validation Technique:

Accuracy with CV

Machine Learning Models With Parameter tuning

Correlation

Feature Importance

Feature Importance

Accuracy with all features:

Accuracy with all features

Accuracy with important features

Accuracy with importance features

Machine Learning Models With Cross Validation

Accuracy with CV

Machine Learning Models With Parameter tuning

Accuracy with GridSearch Tuning

Final Results:

Accuracy Comparision:

Accuracy Comparision

Conclusion Results coulde be described as below:

  • Here we can compare the accuracy obtained by different Classification Models with different strategy For A quick revision
  • Accuracy with all features means the all features of data were used for prediction of will employee left or not? this accuracy is obtained on the test data which was not used in training.
  • Accuracy with important features means the same as above but here only 5 most important features were used. The importance of features we got by using Random Forest Classifier.
  • Accuracy with CV means the mean of accuracies which were obtained on iteration of one CV. here 10 iterations were used
  • Accuracy with GridSearchCV means the best score obtained after tuning the model. Here for CV only 5 folds were used 12345678910112345678911234567891234567891234567891011