/Awesome-RSITR

🎮 A Benchmark and Awesome Collection of Methods for Remote Sensing Image-Text Retrieval (RSITR)| Remote Sensing Cross-Model Retrieval (RSCMR) | Remote Sensing Vision-Lanuage Models (RSVLMs)

🎮 Awesome Remote Sensing Image-Text Retrieval | Remote Sensing Cross-model Retrieval | Remote Sensing Vision-Lanuage Models

🧭 Guideline

A benchmark and awesome collection of papers on Remote Sensing Image-Text Retrieval (RSITR) | Remote Sensing Cross-model Retrieval (RSCMR) from the Internet, if there are any omissions, please contact me jiancheng.pan.plus@gmail.com. 🤝 If you want to join Remote Sensing Vision-Language Models (RSVLMs), you can click Slack Group.

💻 News

Record the major news of RSVLMs community.

  • 2023/12/20:: SkyScript-a comprehensive vision-language dataset for remote sensing images covering 29K distinct semantic tags (AAAI 2024) [link].
  • 2023/11/24: GeoChat: Grounded Large Vision-Language Model for Remote Sensing [link].
  • 2023/06/20: 5M+ image-text pairs datasets RS5M for remote sensing released [link].
  • 2023/06/19: The first vision-language foundation model for remote sensing RemoteCLIP proposed [link].

📊 Remote Sensing Captions Dataset

Collect the more popular image-text pairs datasets on remote sensing, and welcome contact for additions if there are more.

Dataset Name Image size Image Resolution VLMs
UCM-Captions 613 256 × 256 -
Sydney-Captions 2,100 500 × 500 -
RSICD 10,921 224 × 224 -
RSITMD 4,743 256 × 256 -
NWPU-Captions 31,500 256 × 256 -
RS5M 5 million+ All Resolutions GeoRSCLIP
SkyScript 5.2 million+ All Resolutions SkyCLIP

🆚 RSITR | RSCMR Benchmark

Welcome to add more RSITR | RSCMR methods.

📌 Cross-Modal Retrieval on RSICD:

https://paperswithcode.com/sota/cross-modal-retrieval-on-rsicd

📌 Cross-Modal Retrieval on RSITMD:

https://paperswithcode.com/sota/cross-modal-retrieval-on-rsitmd

📖 RSITR | RSCMR Method

Closed-Domain Method: Training and testing on a single dataset.

Open-Domain Method: Using extra datasets for pre-training to gain more inter-domain knowledge.

Hashing Method: Efficient retrieval on large-scale datasets becomes feasible.

Open-Domain Method

  • [AAAI 2024] | SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | [paper] [github]

    • Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, Ram Rajagopal
  • [ArXiv 2023] | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | [paper] [github]

    • Fan Liu, Delong Chen, Zhan-Rong Guan, Xiaocong Zhou, Jiale Zhu, Jun Zhou
  • [ArXiv 2023] | RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model | [paper] [github]

    • Zilun Zhang, Tiancheng Zhao, Yulong Guo, Jianwei Yin.
  • [ArXiv 2023] | RSGPT: A Remote Sensing Vision Language Model and Benchmark | [paper]

    • Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li.
  • [TGRS 2023] | Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval | [paper]

    • Yuan Yuan, Yangfan Zhan, Zhitong Xiong.

Closed-Domain Method

  • [ACMMM 2023] | A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | [paper] [github]

    • Jiancheng Pan, Qing Ma, Cong Bai.
  • [ArXiv 2023] | Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval | [paper]

    • Jiancheng Pan, Qing Ma, Cong Bai.
  • [Sensors 2023] | A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval | [paper]

    • Fuzhong Zheng, Xu Wang, Luyao Wang, Xiong Zhang, Hongze Zhu, Long Wang, Haisu Zhang.
  • [Remote Sensing 2023] | A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing | [paper]

    • Xiong Zhang, Weipeng Li, Xu Wang, Luyao Wang, Fuzhong Zheng, Long Wang, Haisu Zhang.
  • [IGARSS 2023] | A Texture and Saliency Enhanced Image Learning Method For Cross-Modal Remote Sensing Image-Text Retrieval | [paper]

    • Rui Yang, Di Zhang, Yanhe Guo, Shuang Wang.
  • [IGARSS 2023] | A Fast and Accurate Method for Remote Sensing Image-Text Retrieval Based On Large Model Knowledge Distillation | [paper]

    • Yu Liao, Rui Yang, Tao Xie, Hantong Xing, Dou Quan, Shuang Wang, B. Hou.
  • [TGRS 2023] | Knowledge-Aided Momentum Contrastive Learning for Remote-Sensing Image Text Retrieval | [paper]

    • Zhong Ji, Changxu Meng, Yan Zhang, Yanwei Pang, Xuelong Li.
  • [Mathematics 2023] | An End-to-End Framework Based on Vision-Language Fusion for Remote Sensing Cross-Modal Text-Image Retrieval | [paper]

    • Liu He, Shuyan Liu, Ran An, Yudong Zhuo, Jian Tao.
  • [TGRS 2023] Hypersphere-based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning | [paper]

    • Weihang Zhang, Jihao Li, Shuoke Li, Jialiang Chen, Wenkai Zhang, Xin Gao, Xian Sun.
  • [TGRS 2023] | Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval | [paper]

    • Xu Tang, Yijing Wang, Jingjing Ma, Xiangrong Zhang, F. Liu, Licheng Jiao.
  • [ICMR 2023] | Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval | [paper] [github]

    • Jiancheng Pan, Qing Ma, Cong Bai.
  • [CDCEO 2022] | Knowledge-Aware Cross-Modal Text-Image Retrieval for Remote Sensing Images | [paper]

    • Li Mi, Siran Li, Christel Chappuis, D. Tuia.
  • [IGARSS 2022] | A transformer-based cross-modal image-text retrieval method using feature decoupling and reconstruction | [paper]

    • Huan Zhang, Yingzhi Sun, Yu Liao, Siyuan Xu, R. Yang, Shuang Wang, B. Hou, Licheng Jiao.
  • [INT J APPL EARTH OBS 2022] | MCRN: A Multi-source Cross-modal Retrieval Network for remote sensing | [paper]

    • Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Yongqiang Mao, Ruixue Zhou, Hongqi Wang, K. Fu, Xian Sun.
  • [JSTARS 2022] | Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval | [paper]

    • Mohamad Mahmoud Al Rahhal, Y. Bazi, Norah A. Alsharif, Laila Bashmal, N. Alajlan, F. Melgani.
  • [Applied Sciences 2022] | Contrasting Dual Transformer Architectures for Multi-Modal Remote Sensing Image Retrieval | [paper]

    • Mohamad Mahmoud Al Rahhal, M. Bencherif, Y. Bazi, Abdullah Alharbi, M. L. Mekhalfi.
  • [TGRS 2022] | Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information | [paper] [github]

    • Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Xuee Rong, Zhengyuan Zhang, Hongqi Wang, K. Fu, Xian Sun.
  • [TGRS 2021] | A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing | [paper]

    • Zhiqiang Yuan, Wenkai Zhang, Xuee Rong, Xuan Li, Jialiang Chen, Hongqi Wang, K. Fu, Xian Sun.
  • [TGRS 2021] | Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval | [paper] [github]

    • Zhiqiang Yuan, Wenkai Zhang, K. Fu, Xuan Li, Chubo Deng, Hongqi Wang, Xian Sun.
  • [JSTARS 2021] | A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing | [paper]

    • Qimin Cheng, Yuzhuo Zhou, Peng Fu, Yuan Xu, Liang Zhang.
  • [LGRS 2021] | Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval | [paper]

    • Yafei Lv, Wei Xiong, Xiaohan Zhang, Yaqi Cui.
  • [Remote Sensing 2020] | TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images | [paper]

    • T. M. Ali, Y. Bazi, Mohamad Mahmoud Al Rahhal, M. L. Mekhalfi, Lalitha Rangarajan, M. Zuair.

Hashing Method

  • [JSTARS 2022] | Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing | [paper]

    • Yichao Zhang, Xiangtao Zheng, Xiaoqiang Lu.
  • [ArXiv 2022] | Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing | [paper]

    • Georgii Mikriukov, Mahdyar Ravanbakhsh, Begüm Demir.
  • [ICIP 2022] | An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing | [paper]

    • Georgii Mikriukov, Mahdyar Ravanbakhsh, Begüm Demir.