In this project, we explore and compare tree-based methods (CART, bagging, random forest, and boosting) in terms of performance and interpretability of results on the benchmark Boston Housing dataset.
The project made use of the scikit-learn, matplotlib, and seaborn libraries.
Before proceeding be sure you have installed the necessary dependencies running the command:
pip3 install -r requirements.txt
The project is divided into:
- figures: containing the figures generated by the analysis.
- src: containing the source code for classes, methods and functions
- AutoTree.py: containing the class created to manage decision trees and methods.
- AutoEnsemble.py: containing the class created to manage decision tree ensemble and methods.
- utils.py: containing the support function used in the analysis.
- BostonHousing.ipynb: containing the notebook for the analysis (basically you can clone the repo, and run this notebook to see the results yourself)
- requirements.txt: self-explicative.
- Boston.csv: data used for the comparison.
- main.py
To run the project:
- From Notebook: modify the path variable in the second cell of the notebook with the path of the Boston.csv file and run all cells.
- From main.py: modify the path variable in the main.py file with the path of the Boston.csv file and run the file.
- Translate the report in english