homayoonkhadivi/HR-Analytics-EDA-Model-Building-Tuning

in this notebook, I want to analyze the Human Resource dataset. I will build different models and compare them with their gained accuracy.

Jupyter Notebook

HR-Analytics-EDA-Model-Building-Tuning

In this notebook, I want to analyze the Human Resource dataset. I will build different models and compare them with their gained accuracy.

Data Understanding & Problem

The datasets contains the following categories such as: satisfaction_level

1 last_evaluation
2 number_project
3 average_montly_hours
4 time_spend_company
5 Work_accident
6 left
7 promotion_last_5years
8 department
9 salary

Problem:

We want to predict if the employee left the company or not. 1 means the employee will left the company

This Notebook Contains:

Data Underestanding
Data Exploration
Data Preparation
Data Visualization
Feature Engineering
Build Machine Learning Models
Machine Learning Models With Cross Validation
Model Evaluation
Hyperprameters Tuning

Data Visualization

The relationship between the amount of salary and left the company:

The correlation matrix between the different features and the target variable

Accuracy with Cross Validation Technique:

Machine Learning Models With Parameter tuning

Feature Importance

Accuracy with all features:

Accuracy with important features

Machine Learning Models With Cross Validation

Machine Learning Models With Parameter tuning

Final Results:

Accuracy Comparision:

Conclusion Results coulde be described as below:

Here we can compare the accuracy obtained by different Classification Models with different strategy For A quick revision
Accuracy with all features means the all features of data were used for prediction of will employee left or not? this accuracy is obtained on the test data which was not used in training.
Accuracy with important features means the same as above but here only 5 most important features were used. The importance of features we got by using Random Forest Classifier.
Accuracy with CV means the mean of accuracies which were obtained on iteration of one CV. here 10 iterations were used
Accuracy with GridSearchCV means the best score obtained after tuning the model. Here for CV only 5 folds were used 12345678910112345678911234567891234567891234567891011