This repository focuses on Building Decision Trees (Gini, Entropy) for the Titanic Survival Prediction from Kaggle Competition
Here is the link to the Kaggle competition.
https://www.kaggle.com/c/titanic
I have used this data to build various models of Decision Trees and the concept of Pruning to reduce the Overfitting with Decision Tree, check out the most significant features that contributed to the survival of the Titanic Passengers, predict the survival of the passenger on unseen data to submit for the Kaggle Competition
The code used for Building Decision Trees can be found in the repository with name "DecisionTrees_TitanicSurvivalPrediction.ipynb"
The data used to train/test the model is included in this directory (train_titanic.csv, test_titanic.csv, gender_submission.csv)
Code contains 3 versions of Decision Tree Models
Model 1: Decision Tree with Gini Index Criterion
Model 2: Decision Tree with Entropy Criterion
Model 3: Decision Tree with reduced depth (Pruning) to reduce the overfitting with Model 1 and Model 2
Finally, Pruned Decision Tree (Model 3) is used to predict observations on unseen data for final submission to the competition