Titanic a Complete Tutorial of Data Science

Table of Contents

Introduction

This repository is your gateway to diving into one of the most interesting and historically rich datasets available in the field of data science and machine learning: the Titanic dataset. This project aims to provide a comprehensive, step-by-step guide to understanding, analyzing, and predicting outcomes based on the data from the tragic sinking of the Titanic in 1912.

The primary focus of this project is to offer an accessible yet insightful exploration into the world of machine learning, targeting enthusiasts ranging from beginners to intermediate learners who wish to enhance their skills in data science. By working through the included notebooks, participants will gain hands-on experience with real-world data preprocessing, exploratory data analysis (EDA), feature engineering, and the application of various machine learning models.

Environment Setup

Prerequisites: Ensure Python 3.6 or newer is installed on your system.

  1. Create a Virtual Environment:

    • Install virtualenv if you prefer it over the built-in venv (optional):
      pip install virtualenv
    • Create the environment:
      • With venv (Python 3.3+):
        python -m venv env
      • Or, with virtualenv:
        virtualenv env
    • Activate the environment:
      • Windows: env\Scripts\activate
      • Unix/MacOS: source env/bin/activate
    • To deactivate: deactivate
  2. Dependencies: Ensure all dependencies are listed in requirements.txt. Install them using:

    pip install -r requirements.txt

Installation Instructions

To use this project, clone the repository and set up the environment as follows:

  1. Clone the Repository:
    https://github.com/Imran-ml/Titanic-A-Complete-Tutorial-of-Data-Science.git
  2. Setup the Environment:
    • Navigate to the project directory and activate the virtual environment.
    • Install the dependencies from requirements.txt.

Evaluation

Random Forest KNN Decision Tree
0 0.821229 0.759777 0.837989
1 0.797753 0.713483 0.814607
2 0.820225 0.741573 0.831461
3 0.786517 0.735955 0.786517
4 0.848315 0.803371 0.831461

Resources

License

This project is made available under the MIT License.

Conclusion

In conclusion, our exploration of the Titanic dataset using Random Forest, KNN, and Decision Tree algorithms has provided valuable insights into machine learning applications. The Random Forest algorithm demonstrated robustness and consistency, making it a standout choice for this dataset. While Decision Trees showed potential for high accuracy, they also indicated a risk of overfitting. The KNN algorithm, though slightly less accurate, remains a useful tool for understanding dataset nuances.

This project highlights the importance of algorithm selection and tuning in predictive modeling. We encourage further experimentation and learning within this fascinating field of data science. Thank you for engaging with our Titanic Survival Prediction Project.

About Author