This repository contains code used to create the models and results presented in this paper MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning. It modifies the ConvNext V2 architecture to be used with MMEarth, which is a multi-modal geospatial remote sensing data.
See INSTALL.md for more instructions on the installation of dependencies
See TRAINING.md for more details on training and finetuning.
All the pretraining weights can be downloaded from here. The folders are named in the following format. Inside the folder you will find a checkpoint .pth
weight file. An example to load the weights is in the examples folder.
pt-all_mod_$MODEL_$DATA_$IMGSIZE_$LOSS/
$MODEL: atto or tiny
$DATA: 100k or 1M
$IMGSIZE: 128 or 64
$LOSS: uncertainty or unweighted # This is the loss weighting strategy. Most experiments in the paper were run using the uncertainty method.
# note that while the img size is 128 or 64, during pretraining we use a random crop to make the image sizes 112 and 56 respectively.
This repository borrows from the ConvNeXt V2 repository.
Please cite our paper if you use this code or any of the provided data.
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, & Nico Lang (2024). MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning.
@misc{nedungadi2024mmearth,
title={MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning},
author={Vishal Nedungadi and Ankit Kariryaa and Stefan Oehmcke and Serge Belongie and Christian Igel and Nico Lang},
year={2024},
eprint={2405.02771},
archivePrefix={arXiv},
primaryClass={cs.CV}
}