The sinking of the RMS Titanic in 1912 is a well-known tragedy that claimed the lives of many passengers and crew members. Our objective is to build predictive models that can accurately determine whether a passenger survived or perished based on various features such as age, gender, ticket class, and more. By analyzing this dataset and applying supervised learning algorithms, we hope to gain valuable insights and unlock the secrets behind survival patterns.
Not only that, we also create a streamlit app that will use this model and give predictions for an individual's chances of survival. Check out the demo here
The modeling notebook is present in this location
We do an extensive EDA and Feature Engineering, try different classification models and then fine-tune four models with high accuracies.
Take a look at the model metrics and the confusion matrices, roc curves for the four hyperparameter tuned models
- Logistic Regression
- Random Forest Classifier
- XGBoost
- Bagging Classifier
We use the joblib library to save the models. The saved models are stored here
Following inputs are required for prediction
- Title (Mr, Miss, Mrs, Master, Other)
- Passenger Class
- Port of Embarkment
- Gender
- Age
- Fare
- Number of Parents/Children
- Number of Siblings/Spouses
The streamlit application accepts input details of a passenger and predicts the probability of survival. We have a cut-off at 50% for classification between 'survived' and 'did not survive'.
The prediction can be made by selecting one of the four available models. The application also displays the metrics and the confusion matrix, roc curve for the selected model.
Prerequisite: python 3.6 (or higher)
Clone Repo
git clone https://github.com/abhinav-kimothi/Titanic-Survival.git
Navigate to Dir
cd TITANIC-SURVIVAL
Enable Virtual Environment :
python3 -m venv .env
For Linux/MacOS
source .env/bin/active
For Windows
.env\Scripts\activate
Install Requirements :
pip install -r requirements.txt
Run Streamlit Locally
streamlit run src/main.py