Multi Pretext Masked Autoencoder (MP-MAE)

This repository contains code used to create the models and results presented in this paper MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning. It modifies the ConvNext V2 architecture to be used with MMEarth, which is a multi-modal geospatial remote sensing data.

Installation

See INSTALL.md for more instructions on the installation of dependencies

Training

See TRAINING.md for more details on training and finetuning.

Model Checkpoints

All the pretraining weights can be downloaded from here. The folders are named in the following format. Inside the folder you will find a checkpoint .pth weight file. An example to load the weights is in the examples folder.

pt-all_mod_$MODEL_$DATA_$IMGSIZE_$LOSS/

$MODEL: atto or tiny
$DATA: 100k or 1M
$IMGSIZE: 128 or 64
$LOSS: uncertainty or unweighted # This is the loss weighting strategy. Most experiments in the paper were run using the uncertainty method. 
# note that while the img size is 128 or 64, during pretraining we use a random crop to make the image sizes 112 and 56 respectively.

Acknowledgment

This repository borrows from the ConvNeXt V2 repository.

Citation

Please cite our paper if you use this code or any of the provided data.

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, & Nico Lang (2024). MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning.