This project is probably the rite of passage for everyone getting into data science. I never really enjoyed the movie, but the door could totally fit both Rose and Jack.
This project is binary classification problem, where the passenger either survived (1
) or died (0
). Here is a list of the columns of the dataset:
PassengerID
- Unique ID for each columnSurvived
- Whether the passenger survived (1
) or not (0
)Pclass
- Class of the passenger's ticket. Either 1, 2 or 3.Sex
- Passenger's sex (male or female)Age
- Passenger's ageSibsp
- Number of sibling or spouses aboard the TitanicParch
- Number of parents or children aboard the TitanicTicket
- Passenger's ticket numberFare
- The price paid for the passenger's ticketCabin
- Passenger's cabin numberEmbarked
- Port where the passenger embarked. Can be:C
- CherbourgQ
- QueenstownS
- Southampton
Although we know exactly who survived the Titanic, the project is still useful to apply important concepts in data science and machine learning. So here it is!
Objective: Predict which passenger survived the Titanic (Jack died)
Techniques used:
- Pandas, matplotlib, numpy
- Scikit-learn
- Logistic regression, cross-validation, k-nearest neighbours
- Regular expressions
- Heatmap
- Recursive feature elimination
- Hyperparameter optimization
- Grid search
- Random forest classifier