🌟A collection of papers, datasets, code, and pre-trained weights for Remote Sensing Foundation Models (RSFMs).
🔥🔥🔥 Last Updated on 2024.01.02 🔥🔥🔥
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
GeoKR | Geographical Knowledge-Driven Representation Learning for Remote Sensing Images | TGRS2021 | GeoKR | link |
- | Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding | CVPRW2021 | Paper | link |
GASSL | Geography-Aware Self-Supervised Learning | ICCV2021 | GASSL | link |
SeCo | Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data | ICCV2021 | SeCo | link |
SatMAE | SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery | NeurIPS2022 | SatMAE | link |
RS-BYOL | Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images | JSTARS2022 | RS-BYOL | null |
GeCo | Geographical Supervision Correction for Remote Sensing Representation Learning | TGRS2022 | GeCo | null |
RingMo | RingMo: A remote sensing foundation model with masked image modeling | TGRS2022 | RingMo | Code |
RVSA | Advancing plain vision transformer toward remote sensing foundation model | TGRS2022 | RVSA | link |
RSP | An Empirical Study of Remote Sensing Pretraining | TGRS2022 | RSP | link |
MATTER | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks | CVPR2022 | MATTER | null |
CSPT | Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain | RS2022 | CSPT | link |
- | Self-supervised Vision Transformers for Land-cover Segmentation and Classification | CVPRW2022 | Paper | link |
BFM | A billion-scale foundation model for remote sensing images | Arxiv2023 | BFM | null |
TOV | TOV: The original vision model for optical remote sensing image understanding via self-supervised learning | JSTARS2023 | TOV | link |
CMID | CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding | TGRS2023 | CMID | link |
RingMo-Sense | RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling | TGRS2023 | RingMo-Sense | null |
IaI-SimCLR | Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery | CVPRW2023 | IaI-SimCLR | null |
CACo | Change-Aware Sampling and Contrastive Learning for Satellite Images | CVPR2023 | CACo | link |
SatLas | SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding | ICCV2023 | SatLas | link |
GFM | Towards Geospatial Foundation Models via Continual Pretraining | ICCV2023 | GFM | link |
Scale-MAE | Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning | ICCV2023 | Scale-MAE | link |
SpectralGPT | SpectralGPT: Spectral Foundation Model | Arxiv2023 | SpectralGPT | null |
DINO-MC | DINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops | Arxiv2023 | DINO-MC | link |
CROMA | CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders | NeurIPS2023 | CROMA | link |
Cross-Scale MAE | Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing | NeurIPS2023 | Cross-Scale MAE | null |
DeCUR | DeCUR: decoupling common & unique representations for multimodal self-supervision | Arxiv2023 | DeCUR | link |
Presto | Lightweight, Pre-trained Transformers for Remote Sensing Timeseries | Arxiv2023 | Presto | link |
CtxMIM | CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding | Arxiv2023 | CtxMIM | null |
XGeo | Multisensory Geospatial Models via Cross-Sensor Pretraining | - | XGeo | null |
FG-MAE | Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing | Arxiv2023 | FG-MAE | link |
Prithiv | Foundation Models for Generalist Geospatial Artificial Intelligence | Arxiv2023 | Prithiv | link |
RingMo-lite | RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework | Arxiv2023 | RingMo-lite | null |
- | A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion | IGARSS2023 | Paper | null |
EarthPT | EarthPT: a foundation model for Earth Observation | Arxiv2023 | EarthPT | null |
USat | USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery | Arxiv2023 | USat | link |
FoMo-Bench | FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models | Arxiv2023 | FoMo-Bench | Comming soon |
AIEarth | Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data | Arxiv2023 | AIEarth | link |
SkySense | SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery | Arxiv2023 | SkySense | Comming soon |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
RSGPT | RSGPT: A Remote Sensing Vision Language Model and Benchmark | Arxiv2023 | RSGPT | link |
RemoteCLIP | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Arxiv2023 | RemoteCLIP | link |
GeoChat | GeoChat: Grounded Large Vision-Language Model for Remote Sensing | Arxiv2023 | GeoChat | link |
GRAFT | Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment | ICLR2024 | GRAFT | null |
- | Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs | Arxiv2023 | Paper | link |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
DiffusionSat | DiffusionSat: A Generative Foundation Model for Satellite Imagery | Arxiv2023 | DiffusionSat | null |
Seg2Sat | Seg2Sat - Segmentation to aerial view using pretrained diffuser models | Github | null | link |
- | Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps | NeurIPSW2023 | Paper | link |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
CSP | CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations | ICML2023 | CSP | link |
GeoCLIP | GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization | NeurIPS2023 | GeoCLIP | link |
SatCLIP | SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery | Arxiv2023 | SatCLIP | Comming soon |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
- | Self-supervised audiovisual representation learning for remote sensing data | JAG2022 | Paper | link |
Abbreviation | Title | Publication | Paper | Attribute | Link |
---|---|---|---|---|---|
fMoW | Functional Map of the World | CVPR2018 | fMoW | Vision | link |
SEN12MS | SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion | - | SEN12MS | Vision | link |
BEN-MM | BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval | GRSM2021 | BEN-MM | Vision | link |
MillionAID | On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID | JSTARS2021 | MillionAID | Vision | link |
SeCo | Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data | ICCV2021 | SeCo | Vision | link |
fMoW-S2 | SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery | NeurIPS2022 | fMoW-S2 | Vision | link |
TOV-RS-Balanced | TOV: The original vision model for optical remote sensing image understanding via self-supervised learning | JSTARS2023 | TOV | Vision | link |
SSL4EO-S12 | SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation | GRSM2023 | SSL4EO-S12 | Vision | link |
SSL4EO-L | SSL4EO-L: Datasets and Foundation Models for Landsat Imagery | Arxiv2023 | SSL4EO-L | Vision | link |
SatlasPretrain | SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding | ICCV2023 | SatlasPretrain | Vision (Supervised) | link |
CACo | Change-Aware Sampling and Contrastive Learning for Satellite Images | CVPR2023 | CACo | Vision | Comming soon |
RSVG | RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | TGRS2023 | RSVG | Vision-Language | link |
RS5M | RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model | Arxiv2023 | RS5M | Vision-Language | link |
GEO-Bench | GEO-Bench: Toward Foundation Models for Earth Monitoring | Arxiv2023 | GEO-Bench | Vision (Evaluation) | link |
RSICap & RSIEval | RSGPT: A Remote Sensing Vision Language Model and Benchmark | Arxiv2023 | RSGPT | Vision-Language | Comming soon |
SkyScript | SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | AAAI2024 | SkyScript | Vision-Language | Comming soon |
Title | Publication | Paper | Attribute |
---|---|---|---|
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works | TGRS2023 | Paper | Vision & Vision-Language |
Vision-Language Models in Remote Sensing: Current Progress and Future Trends | Arxiv2023 | Paper | Vision-Language |
The Potential of Visual ChatGPT For Remote Sensing | Arxiv2023 | Paper | Vision-Language |
遥感大模型:进展与前瞻 | 武汉大学学报 (信息科学版) 2023 | Paper | Vision & Vision-Language |
地理人工智能样本:模型、质量与服务 | 武汉大学学报 (信息科学版) 2023 | Paper | - |
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey | JSTARS2023 | Paper | Vision & Vision-Language |
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters | Arxiv2023 | Paper | Vision |
An Agenda for Multimodal Foundation Models for Earth Observation | IGARSS2023 | Paper | Vision |
Transfer learning in environmental remote sensing | RSE2024 | Paper | Transfer learning |
遥感基础模型发展综述与未来设想 | 遥感学报2023 | Paper | - |
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications | Arxiv2023 | Paper | Vision-Language |
If you find this repository useful, please consider giving a star ⭐ and citation:
@misc{guo2023skysense,
title={SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery},
author={Xin Guo and Jiangwei Lao and Bo Dang and Yingying Zhang and Lei Yu and Lixiang Ru and Liheng Zhong and Ziyuan Huang and Kang Wu and Dingxiang Hu and Huimei He and Jian Wang and Jingdong Chen and Ming Yang and Yongjun Zhang and Yansheng Li},
year={2023},
eprint={2312.10115},
archivePrefix={arXiv},
primaryClass={cs.CV}
}