The main task here was to build a simple decision tree model to predict crime rate using the DecisionTreeRegressor().
Four regression models were built using the original and modified dataset and their resulted trees were observed. The complexity of the model was then reduced by varying the parameter max_depth
of the regressor.
The importance of specific features was also analyzed and an interesting pattern was observed when separating the feature regions in four news features, one for each specific region. It made clearer the choice of the algorithm for geographic regions 1 and 4 when making a prediction. The results could be improved with a larger dataset.
Finally, the performance of the models was measured by generating and analyzing the learning curves.
All that said, for being my first machine learning model I think it was a very positive and exciting experience.
Libraries used in this project:
numpy
pandas
matplotlib
seaborn
sklearn
graphviz
pytdotplus
io