/EEE-598-Project

Implementation of Improved Decision Tree Algorithm

GNU General Public License v3.0GPL-3.0

README FOR EXECUTION INSTRUCTIONS

Ensure the following packages are downloaded

  • numpy - 1.21.5
  • pandas - 1.4.4
  • scikit-learn - 1.0.2
  • matplotlib - 3.5.2
  • collections

Executing Decision_trees_STak:

  • Breast Cancer data-set: This data-set is already loaded in sklearn, so you will not have to change anything, just uncomment the section "Breast Cancer Data" in the code with rest of the data-sets commented.

  • Car evaluation data-set: Put the file "car_evaluation.csv" which is uploaded on canvas, in the same library as the code. Then in the code in the section "Car Data" update the path of parameter "df" same as the library. Rest of the things are already encoded, should work perfectly.

  • Data-set used in paper: To use this, comment rest of the data-sets, and uncomment the "Custom" data-set

Executing Decision Tree Notebook Folders

Car Evaluation dataset

  • The dataset is stored in the 'data' folder shared in the zip file
  • Navigate to folder 'Decision_tree_car_sales'
  • Execute the file 'DecisionTree.ipynb'

Breast Cancer Dataset

  • The dataset is pre-defined in scikit-learn package.
  • Navigate to folder 'Decision_Tree_Breast_Cancer'
  • Execute the file 'DecisionTree.ipynb'

Dataset used by the paper / Custom Dataset

  • The dataset is created in the notebook.
  • Navigate to folder 'Decision_Tree_Custom_data'
  • Execute the file 'Decision_Tree_Custom.ipynb'