In this project, I examined a dataset and developed predictive models to generate insights for the Human Resources (HR) department of a major consulting firm. I employed various machine learning models to forecast employee attrition, evaluated their performance, and selected the best-performing model for deployment.
Salifort Motors' HR department aims to enhance employee satisfaction and has gathered data from their workforce but needs guidance on how to utilize it effectively. They seek data-driven insights to understand the factors influencing employee turnover, specifically asking, "What’s likely to make an employee leave the company?"
My objectives in this project are to analyze the collected data and develop a predictive model to determine whether an employee is likely to leave. By identifying employees at risk of quitting, we can uncover the underlying reasons for their departure. Since recruiting new employees is both time-consuming and costly, improving employee retention will be advantageous for the company.
The dataset, contains 14,999 rows and 10 columns. The features encompass employee-reported job satisfaction levels (ranging from 0 to 1), the score of the employee's last performance review (ranging from 0 to 1), the number of projects the employee is involved in, the average number of hours worked per month, and an indicator of whether the employee left the company.
The visualization below shows a stacked boxplot of average_monthly_hours
distributions for number_project
, comparing the distributions of employees who stayed versus those who left
.
The scatterplot below shows the average_monthly_hours
vs satisfaction_level
, comparing employees who stayed vs those who left
.
A stacked histogram to compare department distribution of employees who left to that of employees who didn't
A heatmap to visualize how correlated variables are. Considering which variables we're interested in examining correlations between.
A random forest model with 300 decision trees was utilized to identify the key features influencing employee turnover. The plot below indicates that in this random forest model, last_evaluation
, number_project
, tenure
, and overworked
are the most significant predictors, listed in descending order of importance. These features are also the top predictors in the decision tree model. The random forest model achieved an AUC of 94% and a recall of 90%.
This model can assist the HR department in predicting which employees are likely to leave the company. Additionally, it can help identify the factors contributing to employee turnover, enabling the HR team to take actions to improve retention. This is particularly valuable as recruiting, interviewing, and hiring new employees is both time-consuming and costly.