This is a collection of Kaggle competitions with which I have trained and learnt various aspects of data science and machine learning methods. So far, the projects are:
- Disaster Titanic: predict the survival of passengers of Titanic using various features (class, cabin position, gender, etc...). I mostly followed a thorough guide on this one to discover the subtle details of filling missing values and feature engineering.
- Spaceship Titanic: a more thorough version of the previous one, with many missing values. I used what I learnt from Disaster Titanic to do this (I used inspiration from various posts for the very difficult last missing values). This only contains a very simple training by a Random Forest Classifier (no model selection or cross validation yet) as this led to a very satisfactory result of nearly 80% success. This confirmed to me that missing values and feature engineering is indeed a crucial aspect of training a good machine learning model.
- Credit Risk Analysis: predict whether a loan is risky or not using various features. The end-result is a little bit too performant indicating possible overfitting. However this dataset was a great exercise in dataset filling and encoding.
- Pokemon Legendary: predict whether a pokemon is a legendary using data scraped from Serebii. The dataset is very clean so this is application was used to compare the effectiveness of various models. In this case it is easy to obtain good performance (as high as 99% accuracy) since legendary pokemon follow relatively consistent patterns (strength, position in the pokedex, type, etc...).
- NASA Asteroids Classification: predict whether an Near Earth Object is risky based on various aspects of its trajectory.
- Abalone Regression: Competition on the Abalone dataset where the number of Rings of Abalones must be predicted from essentially their dimensions and sex.