/SSL4EO-S12

SSL4EO-S12: a large-scale dataset for self-supervised learning in Earth observation

Primary LanguagePythonApache License 2.0Apache-2.0

SSL4EO-S12

The SSL4EO-S12 dataset is a large-scale mutilmodal multitemporal dataset for unsupervised/self-supervised pre-training in Earth observation. The dataset consists of unlabeled patch triplets (Sentinel-1 dual-pol SAR, Sentinel-2 top-of-atmosphere multispectral, Sentinel-2 surface reflectance multispectral) from 251079 locations across the globe, each patch covering 2640mx2640m and including four seasonal time stamps.

ssl4eo-s12

Access to the dataset

  • Full dataset: The full SSL4EO-S12 dataset (1.5TB, 500GB for each modality) is accessible at mediaTUM. There are some void IDs (gaps in folder names), see data/void_ids.csv.
  • Example subset: An example 100-patch subset (600MB) is available at Google Drive.
  • RGB version: An RGB version of the full dataset is available here (link broken, we are working on it). The raw S2-L1C int16 values are normalized by mean and std and converted to uint8.
  • A 50k (random) RGB subset (18GB) is available here (link broken, we are working on it). Sample IDs see data/50k_ids_random.csv.

Pre-trained models

The pre-trained models with different SSL methods are provided as follows (13 bands of S2-L1C, 100 epochs, input clip to [0,1]).

SSL method Arch BigEarthNet EuroSAT So2Sat-LCZ42 Download Usage
MoCo ResNet50 91.8% 99.1% 60.9% full ckpt backbone logs define model, load weights
MoCo ViT-S/16 89.9% 98.6% 61.6% full ckpt backbone logs define model, load weights
DINO ResNet50 90.7% 99.1% 63.6% full ckpt backbone logs define model, load weights
DINO ViT-S/16 90.5% 99.0% 62.2% full ckpt backbone logs define model, load weights
MAE ViT-S/16 88.9% 98.7% 63.9% full ckpt backbone logs define model, load weights
Data2vec ViT-S/16 90.3% 99.1% 64.8% full ckpt backbone logs define model, load weights

Other pre-trained models:

SSL method Arch Input Download
MoCo ResNet18 S2-L1C 13 bands full ckpt backbone logs
ResNet18 S2-L1C RGB full ckpt, full ckpt ep200 backbone logs
ResNet50 S2-L1C RGB full ckpt backbone logs
ResNet50 S1 SAR 2 bands full ckpt backbone logs

License

This repository is released under the Apache 2.0 license. The dataset and pretrained model weights are released under the CC-BY-4.0 license.

Citation

@article{wang2022ssl4eo,
  title={SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation},
  author={Wang, Yi and Braham, Nassim Ait Ali and Xiong, Zhitong and Liu, Chenying and Albrecht, Conrad M and Zhu, Xiao Xiang},
  journal={arXiv preprint arXiv:2211.07044},
  year={2022}
}