/Machine_Learning_Practice

Data playground for improving machine learning skills using Kaggle datasets. Work in Progress: Listed here are Kaggle competitions I am working on, not necessarily finished.

Primary LanguageJupyter Notebook

Machine Learning Practice

Data playground for improving machine learning skills using Kaggle datasets.

To create the mlp environment run:

conda env create -f environment.yml

Contents

INGV - Volcanic Eruption Prediction 🌋

What if scientists could anticipate volcanic eruptions as they predict the weather? While determining rain or shine days in advance is more difficult, weather reports become more accurate on shorter time scales. A similar approach with volcanoes could make a big impact. Just one unforeseen eruption can result in tens of thousands of lives lost. If scientists could reliably predict when a volcano will next erupt, evacuations could be more timely and the damage mitigated.

Enter Italy's Istituto Nazionale di Geofisica e Vulcanologia (INGV), with its focus on geophysics and volcanology. The INGV's main objective is to contribute to the understanding of the Earth's system while mitigating the associated risks. Tasked with the 24-hour monitoring of seismicity and active volcano activity across the country, the INGV seeks to find the earliest detectable precursors that provide information about the timing of future volcanic eruptions.

Data size is 31.25 GB and contains 8953 files.

Download the data zip file directly from Kaggle by running the following code within the data/ directory:

kaggle competitions download -c predict-volcanic-eruptions-ingv-oe

The data zip file can then be unzipped via:

unzip predict-volcanic-eruptions-ingv-oe.zip

For the data zip file to download successfully, please ensure your ~/.kaggle folder contains a valid Kaggle API token kaggle.json.

If not, please create a new token from within your Kaggle account settings, then move the token from the Downloads folder to the ~/.kaggle folder.

Open In Colab Link to Kaggle Competition

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Warm-up Exercises 🔥


Each exercise includes the following:

data/: contains the dataset description, train and test sets, and sample prediction CSV file

real_data/: contains the full dataset, and/or kaggle leaderboard score distribution

model.ipynb: sample ML workflow using Jupyter Notebook

This is a fun competition aimed at helping you get started with machine learning. While the dataset is publicly available on the internet, looking up the answers defeats the entire purpose. So seriously, don't do that.


1. Titanic - Machine Learning from Disaster 🚢


Adapted from GIF animation by Artistosteles, Wikimedia Commons

Link to Kaggle Competition

2. House Prices - Advanced Regression Techniques 🏠

Link to Kaggle Competition

3. Digit Recognizer 🔢

Link to Kaggle Competition

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Check your score! 📈

score.py is a python script for evaluating final model performance on the test set, callable within each exercise directory via:

python ../score.py -f [prediction csv filepath]