/DecisionTrees-Gini-Entropy-TitanicSurvivalPrediction

This repository contains code to build various versions of Decision Tree Models (Gini, Entropy) for the Titanic Passenger Survival prediction from Kaggle Competitions. Pruning is also performed to reduce the Overfitting

Primary LanguageJupyter Notebook

DecisionTrees-Gini-Entropy-TitanicSurvivalPrediction

This repository focuses on Building Decision Trees (Gini, Entropy) for the Titanic Survival Prediction from Kaggle Competition

Here is the link to the Kaggle competition.

https://www.kaggle.com/c/titanic

I have used this data to build various models of Decision Trees and the concept of Pruning to reduce the Overfitting with Decision Tree, check out the most significant features that contributed to the survival of the Titanic Passengers, predict the survival of the passenger on unseen data to submit for the Kaggle Competition

The code used for Building Decision Trees can be found in the repository with name "DecisionTrees_TitanicSurvivalPrediction.ipynb"

dt

The data used to train/test the model is included in this directory (train_titanic.csv, test_titanic.csv, gender_submission.csv)

Decision Tree Models

Code contains 3 versions of Decision Tree Models

Model 1: Decision Tree with Gini Index Criterion

Model 2: Decision Tree with Entropy Criterion

Model 3: Decision Tree with reduced depth (Pruning) to reduce the overfitting with Model 1 and Model 2

Finally, Pruned Decision Tree (Model 3) is used to predict observations on unseen data for final submission to the competition