/Kaggle-Titanic

Trying out things on Kaggle's Titanic dataset

Primary LanguageJupyter Notebook

Kaggle-Titanic

This repo contains an number of scripts and notebooks trying out things on the Titanic dataset on Kaggle.

A tongue-in-cheek look at removing the gender bias in the dataset, includes:

  • Using IBMs AIF360 to evaluate bias in data and models, and trying out reweighing as a method to remove the bias.
  • An object-orientated approach to preprocessing using sklearn Pipelines, with custom transformers.

See also: https://www.kaggle.com/garethjns/titanicsexism-fairness-in-ml

A fork of this kernel. Attempts to create a reasonably scoring model using the least code possible. A great example of how not to programme.
See also: https://www.kaggle.com/garethjns/shortest-titanic-kernel-0-78468

LightGBM

Examples working with Microsoft's LightGBM

Introduction to pre-processing and preparing the data to use in LightGBM.
See also: https://www.kaggle.com/garethjns/microsoft-lightgbm-0-795

Script to prepare data, grid search best model parameters, fit a (slightly more) robust ensemble on multiple data splits. Can score about 0.822 (top 3%) with a lucky random seed.
See also: https://www.kaggle.com/garethjns/microsoft-lightgbm-with-parameter-tuning-0-822

A very simple and fast script fitting a logistic regression model with almost no preprocessing. Can score in the top 10% with a lucky random seed, and is a good example of why such a small dataset is terrible for model performance evaluation!
See also: https://www.kaggle.com/garethjns/3-seconds-and-3-features-top-10