/Kaggle-Playground-Series

This is a collection of my submissions to the ongoing Kaggle Playground Series!

Primary LanguageJupyter NotebookMIT LicenseMIT

Kaggle-Playground-Series 🏅

This is a collection of my submissions to the ongoing Kaggle Playground Series!

My submissions so far:

Episode Project Best Public Score
PS4E4 Regression with an Abalone Dataset 0.14804 (RMSLE)
PS4E5 Regression with a Flood Prediction Dataset 0.85192 (R2)
PS4E6 Classification with an Academic Success Dataset 0.83673 (Accuracy)
PS4E7 Binary Classification of Insurance Cross Selling 0.87836 (ROC AUC)
PS4E8 Binary Prediction of Poisonous Mushrooms -- (MCC)

PSXEX = Playground Season X Episode X

About the Tabular Playground Series

The goal of the Tabular Playground Series is to provide the Kaggle community with a variety of fairly light-weight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The duration of each competition will generally only last a few weeks, and may have longer or shorter durations depending on the challenge. The challenges will generally use fairly light-weight datasets that are synthetically generated from real-world data, and will provide an opportunity to quickly iterate through various model and feature engineering ideas, create visualizations, etc.

Synthetically-Generated Datasets

Using synthetic data for Playground competitions allows us to strike a balance between having real-world data (with named features) and ensuring test labels are not publicly available. This allows us to host competitions with more interesting datasets than in the past. While there are still challenges with synthetic data generation, the state-of-the-art is much better now than when we started the Tabular Playground Series two years ago, and that goal is to produce datasets that have far fewer artifacts. Please feel free to give us feedback on the datasets for the different competitions so that we can continue to improve!