HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Di Wang^{1 ∗}, Meiqi Hu^{1 ∗}, Yao Jin^{1 ∗}, Yuchun Miao^{1 ∗}, Jiaqi Yang^{1 ∗}, Yichu Xu^{1 ∗}, Xiaolei Qin^{1 ∗}, Jiaqi Ma^{1 ∗}, Lingyu Sun^{1 ∗}, Chenxing Li^{1 ∗}, Chuan Fu², Hongruixuan Chen³, Chengxi Han^{1 †}, Naoto Yokoya³, Jing Zhang^{1 †}, Minqiang Xu⁴, Lin Liu⁴, Lefei Zhang¹, Chen Wu^{1 †}, Bo Du^{1 †}, Dacheng Tao⁵, Liangpei Zhang^{1 †}

¹ Wuhan University, ² Chongqing University, ³ The University of Tokyo, ⁴ National Engineering Research Center of Speech and Language Information Processing, ⁵ Nanyang Technological University.

^∗ Equal contribution, ^† Corresponding author

🔥 Update

2024.10.22

Scripts for Image Super-Resolution.
Checkpoints for Image Denoising.

2024.07.18

Models can be downloaded from both Baidu Drive (百度网盘) and Hugging Face 🤗.
Datasets for HSI denoising have been released for research use only. Please check it here.

2024.06.18

The paper is post on arxiv!(arXiv 2406.11519)

🌞 Overview

HyperSIGMA is the first billion-level foundation model specifically designed for HSI interpretation. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module.

Figure 1. Framework of HyperSIGMA.

Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA’s versatility and superior representational capability compared to current state-of-the-art methods. It outperforms advanced models like SpectralGPT, even those specifically designed for these tasks.

Figure 2. HyperSIGMA demonstrates superior performance across 16 datasets and 7 tasks, including both high-level and low-level hyperspectral tasks, as well as multispectral scenes.

📖 Datasets

To train the foundational model, we collected hyperspectral remote sensing image samples from around the globe, constructing a large-scale hyperspectral dataset named HyperGlobal-450K for pre-training. HyperGlobal-450K contains over 20 million three-band images, far exceeding the scale of existing hyperspectral datasets.

Figure 3. The distribution of HyperGlobal-450K samples across the globe, comprising 1,701 images (1,486 EO-1 and 215 GF-5B) with hundreds of spectral bands.

🚀 Pretrained Models

Pretrain	Backbone	Model Weights
Spatial_MAE	ViT-B	Baidu Drive & Hugging Face
Spatial_MAE	ViT-L	Baidu Drive & Hugging Face
Spatial_MAE	ViT-H	Baidu Drive & Hugging Face
Spectral_MAE	ViT-B	Baidu Drive & Hugging Face
Spectral_MAE	ViT-L	Baidu Drive & Hugging Face
Spectral_MAE	ViT-H	Baidu Drive & Hugging Face

🔨 Usage

Pretraining

We pretrain the HyperSIGMA with SLURM. This is an example of pretraining the large version of Spatial ViT:

srun -J spatmae -p xahdnormal --gres=dcu:4 --ntasks=64 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spat.py \
--model 'spat_mae_l' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 32 --gpu_num 64 --port 60001

Another example of pretraining the huge version of Spectral ViT:

srun -J specmae -p xahdnormal --gres=dcu:4 --ntasks=128 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spec.py \
--model 'spec_mae_h' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 16 --gpu_num 128 --port 60004  --epochs 1600 --mask_ratio 0.75 \
--use_ckpt 'True'

The training can be recovered by setting --resume

--resume [path of saved model]

Finetuning

Image Classification:

Please refer to ImageClassification-README.

Target Detection & Anomaly Detection:

Please refer to HyperspectralDetection-README.

Change Detection:

Please refer to ChangeDetection-README.

Spectral Unmixing:

Please refer to HyperspectralUnmixing-README.

Denoising:

Please refer to Denoising-README.

Super-Resolution:

Please refer to SR-README.

Multispectral Change Detection:

Please refer to MultispectralCD-README.

⭐ Citation

If you find HyperSIGMA helpful, please consider giving this repo a ⭐ and citing:

@article{hypersigma,
  title={HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model},
  author={Wang, Di and Hu, Meiqi and Jin, Yao and Miao, Yuchun and Yang, Jiaqi and Xu, Yichu and Qin, Xiaolei and Ma, Jiaqi and Sun, Lingyu and Li, Chenxing and Fu, Chuan and Chen, Hongruixuan and Han, Chengxi and Yokoya, Naoto and Zhang, Jing and Xu, Minqiang and Liu, Lin and Zhang, Lefei and Wu, Chen and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
  journal={arXiv preprint arXiv:2406.11519},
  year={2024}
}

🎺 Statement

For any other questions please contact di.wang at gmail.com or whu.edu.cn, and chengxi.han at whu.edu.cn.

💖 Thanks

This project is based on MMCV, MAE, Swin Transformer, VSA, RVSA, DAT, HTD-IRN, GT-HAD, MSDformer, SST-Former, SST, CNNAEU and DeepTrans. Thanks for their wonderful work!

WHU-Sigma/HyperSIGMA