/Data-Analytics-Mini-Project

Attrition analysis using different models

Primary LanguageJupyter Notebook

Data-Analytics-Mini-Project

The folders are structured as follows:

  1. Attrition analysis.ipynb: The final set of models (6 models in the order Logistic Regression, Naive Bayes, KNN, SVM, XGBoost, Random Forest Classifier)
  2. EDA & Visualisation.ipynb: The compiled EDA and visualisation, only the important visualisations have been mentioned here
  3. Presentation: The final ppt used in the video
  4. Debug Phase 1: The individual EDA files of each team member
  5. Debug Phase 2: The individual models for testing and debugging
  6. Datasets: The original dataset(hr_dataset) and the cleaned dataset (final_dataset)

Link to dataset: https://www.kaggle.com/ghoshanisha/ibm-employee-attrition

There is no hyperparameter tuning required to run the models(we have done that for you), please change the path to the dataset in both the files 1,2 mentioned above.

Use the final_dataset for training and testing.

Please consider files 1&2 for final evalutation, the debug folders are to highlight the process not for evaluation.

Phase 1 report: The extensive literature survey along with all the relevant analysis found after performing EDA has been mentioned in this report.

Link to phase 1 report: https://docs.google.com/document/d/1sZ_t2Iv62C-bjt82gjArRB77d7p1ttzy/edit?usp=sharing&ouid=105716424213299764928&rtpof=true&sd=true

Phase 2 report: The relevant papers along with the elaborate methodology to implement our solution approach has been mentioned in this report.

Link to phase 2 report: https://docs.google.com/document/d/1GJGtkHT5UfOSzl5Y6lf2Y2oSij281bt2/edit?usp=sharing&ouid=105716424213299764928&rtpof=true&sd=true

Link to video:https://drive.google.com/file/d/1rPj3NRbsOHy2o29OEhxMEY7gs8UPa0a4/view?usp=sharing Link to plagiarism report: https://drive.google.com/drive/folders/1O5eJ0emwp3QlSStglR3xPVEnlzPb8xhJ?usp=sharing