Dynamic Focus-aware Positional Queries for Semantic Segmentation

[CVPR 2023] This is the official repository for our paper: Dynamic Focus-aware Positional Queries for Semantic Segmentation by Haoyu He, Jianfei Cai, Zizheng Pan, Jing liu, Jing Zhang, Dacheng Tao and Bohan Zhuang.

🚀 News

[2022-06-07]: Release code.

[2022-12-03]: Add Swin-B result.

[2023-02-28]: Got accepted by CVPR 2023!

Introduction:

We have proposed a simple yet effective query design for semantic segmentation under DETR-like frameworks, that the positional queries are aggregated from previous cross-attention scores and the localization infromation of the preceding layer. Therefore, each query is aware of its previous focus, thus providing more accurate positional guidance and encouraging the cross-attention consistency across the decoder layers.

Experimental results:

We provide single-seed experimental results and pre-trained models for FASeg:

ADE20k val	Backbone	Crop size	mIoU s.s. (%)	mIoU m.s. (%)	Params. (M)	FLOPs	Model
FASeg w/ conditional K_p	R50	512x512	48.3	49.3	51	72G	model
FASeg w/ conditional K_p	Swin-T	512x512	49.6	51.3	54	75G	model
FASeg w/ conditional K_p	Swin-B	640x640	55.0	56.0	113	225G	model
FASeg w/ conditional K_p	Swin-L	640x640	56.3	57.7	228	405G	model

Cityscapes val	Backbone	Crop size	mIoU s.s. (%)	Params. (M)	FLOPs	Model
FASeg w/ conditional K_p	R50	1024x2048	80.5	67	533G	model

Installation

See installation instructions for mask2former.

Get started:

We provide training scripts for deriving all of our models:

# Train FASeg with R50 backbone and 8 GPUs on ADE20k:
python train_net.py --num-gpus 8 \
  --config-file configs/ade20k/semantic-segmentation/faseg_r50.yaml
  
# Train FASeg with Swin-T backbone and 8 GPUs on ADE20k:  
python train_net.py --num-gpus 8 \
  --config-file configs/ade20k/semantic-segmentation/swin/faseg_swin_tiny.yaml
  
# Train FASeg with Swin-B backbone and 8 GPUs on ADE20k:  
python train_net.py --num-gpus 8 \
  --config-file configs/ade20k/semantic-segmentation/swin/faseg_swin_base_IN21k_res640.yaml
  
# Train FASeg with Swin-L backbone and 8 GPUs on ADE20k:  
python train_net.py --num-gpus 8 \
  --config-file configs/ade20k/semantic-segmentation/swin/faseg_swin_large_IN21k_res640.yaml
  
# Train FASeg with R50 backbone and 8 GPUs on Cityscapes:  
python train_net.py --num-gpus 8 \
  --config-file configs/cityscapes/semantic-segmentation/faseg_r50.yaml

We also provide evaluation scrips for all of our models:

# Evaluate FASeg with R50 backbone and 1 GPU on ADE20k val:
python train_net.py --num-gpus 1 \
  --config-file configs/ade20k/semantic-segmentation/faseg_r50.yaml --eval-only MODEL.WEIGHTS "model/ade_faseg_r50.pth"
  
# Evaluate FASeg with Swin-T backbone and 1 GPUs on ADE20k val:  
python train_net.py --num-gpus 1 \
  --config-file configs/ade20k/semantic-segmentation/swin/faseg_swin_tiny.yaml --eval-only MODEL.WEIGHTS "model/ade_faseg_swin_ti.pth"
  
# Evaluate FASeg with Swin-B backbone and 1 GPUs on ADE20k val:  
python train_net.py --num-gpus 1 \
  --config-file configs/ade20k/semantic-segmentation/swin/faseg_swin_base_IN21k_res640.yaml --eval-only MODEL.WEIGHTS "model/ade_faseg_swin_b.pth"

# Evaluate FASeg with Swin-L backbone and 1 GPUs on ADE20k val:  
python train_net.py --num-gpus 1 \
  --config-file configs/ade20k/semantic-segmentation/swin/faseg_swin_large_IN21k_res640.yaml --eval-only MODEL.WEIGHTS "model/ade_faseg_swin_l.pth"
  
# Evaluate FASeg with R50 backbone and 1 GPUs on Cityscapes val:  
python train_net.py --num-gpus 1 \
  --config-file configs/cityscapes/semantic-segmentation/faseg_r50.yaml --eval-only MODEL.WEIGHTS "model/ade_faseg_swin_l.pth"

For more usage, please see Getting started with Mask2former and Getting started with Detectron2.

If you find this repository or our paper useful, please consider cite:

@inproceedings{he2023dynamic,
  title={Dynamic Focus-aware Positional Queries for Semantic Segmentation},
  author={He, Haoyu and Cai, Jianfei and Pan, Zizheng and Liu, Jing and Zhang, Jing and Tao, Dacheng and Zhuang, Bohan},
  booktitle={CVPR},
  year={2023}
}

Acknowledgement

The code is largely based on Mask2Former. We thank the authors for their open-sourced code.