/tree_models_productivity

Predicting employee productivity using tree models (decision tree cassification, cross validation, minimal cost-complexity pruning, random forest)

Primary LanguageJupyter Notebook

Predicting Employee Productivity Using Tree Models

For this project, we'll be using the dataset Productivity Prediction of Garment Employees. The original dataset is in the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees). Below is a description of the dataset, according to its official summary:

"The Garment Industry is one of the key examples of the industrial globalization of this modern era. It is a highly labour-intensive industry with lots of manual processes. Satisfying the huge global demand for garment products is mostly dependent on the production and delivery performance of the employees in the garment manufacturing companies. So, it is highly desirable among the decision makers in the garments industry to track, analyse and predict the productivity performance of the working teams in their factories."

For analysis we have the following data columns:

  • date : Date in MM-DD-YYYY
  • day : Day of the Week
  • quarter : A portion of the month. A month was divided into four quarters
  • department : Associated department with the instance
  • team_no : Associated team number with the instance
  • no_of_workers : Number of workers in each team
  • no_of_style_change : Number of changes in the style of a particular product
  • targeted_productivity : Targeted productivity set by the Authority for each team for each day.
  • smv : Standard Minute Value, it is the allocated time for a task
  • wip : Work in progress. Includes the number of unfinished items for products
  • over_time : Represents the amount of overtime by each team in minutes
  • incentive : Represents the amount of financial incentive (in BDT) that enables or motivates a particular course of action.
  • idle_time : The amount of time when the production was interrupted due to several reasons
  • idle_men : The number of workers who were idle due to production interruption
  • actual_productivity : The actual % of productivity that was delivered by the workers. It ranges from 0-1.

The procedure is described in the attached notebook tree_models_productivity.ipynb.

grafik