/Project_MB_Regression

Tutorial to learn how to train different machine learning methods to model glacier mass balance.

Primary LanguageJupyter NotebookMIT LicenseMIT

Project: Machine learning for glacier mass balance modelling

Authors:
Jordi Bolibar
Facundo Sapienza

Project description

The project consists on 4 Jupyter notebooks in Python, focused on glacier mass balance modelling using different types of regression methods. Data is extracted using the Open Global Glacier Model (OGGM), providing climate, topographical and mass balance data for almost any glacier on Earth. The project is focused on all glaciers in Scandinavia, with the goal of learning multi-annual mass balance changes from the Hugonnet et al. (2021) geodetic mass balance paper. We cover one of the most popular machine learning libraries for beginners: scikit-learn.

The following topics are covered in this project:

  1. Data pre-processing 🌍: Here we learn how to extract all the necessary climate, topographical and mass balance data for any glacier on Earth using OGGM.

  2. Data exploration 🔍: Here we learn how understand the dataset we have compiled, what are the assumptions behind it, and how to correctly construct a good dataset for regression machine learning models of physical processes.

  3. Training 🚀: Here we introduce the different machine learning methods that we can use to simulate glacier mass balance. We will explore the characteristics of each one, including their specific hyperparameters.

  4. Validation 🎯: Once we are acquainted with the different machine learning models, we will introduce the concepts of cross-validation and hyperparameter selection. We will choose one of the models, which we will tune in detail in cross-validation in order to correctly extrapolate for unseen data.

Bonus projects!

The main project's aim is to learn the fundamentals of machine learning workflows applied to glacier and geophysical modelling. Nonetheless, for those seeking to dig deeper into the glaciology aspects of this, we propose two bonus projects. These two projects use distributed (gridded) datasets, which have a much higher spatial resolution and complexity. Two additional notebooks can be followed for this:

  1. Distributed data pre-processing 📊: In this notebook we learn how to download gridded datasets for any glacier in the world, including geodetic mass balance, ice thickness data, topographical data such as surface slope, aspect, distance to border, and how to easily downscale climate data to those glaciers.

  2. Bonus projects ♦️: In this notebook we introduce the two bonus projects: one focusing on predicting gridded geodetic mass balance rates, and another one inferring distriburted glacier ice thickness.

We will be updating this project iteratively. If you come across any typos or mistakes in the notebooks, please feel free to make a pull request!