This software project accompanies the research paper, AutoFocusFormer: Image Segmentation off the Grid (CVPR 2023).
Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin
arXiv | video narration | AFF-Classification | AFF-Segmentation (this repo)
AutoFocusFormer (AFF) is the first adaptive-downsampling network capable of dense prediction tasks such as semantic/instance segmentation.
AFF abandons the traditional grid structure of image feature maps, and automatically learns to retain the most important pixels with respect to the task goal.
AFF consists of a local-attention transformer backbone and a task-specific head. The backbone consists of four stages, each stage containing three modules: balanced clustering, local-attention transformer blocks, and adaptive downsampling.
AFF demonstrates significant savings on FLOPs (see our models with 1/5 downsampling rate), and significant improvement on recognition of small objects.
Notably, AFF-Small achieves 44.0 instance segmentation AP and 66.9 panoptic segmentation PQ on Cityscapes val with a backbone of only 42.6M parameters, a performance on par with Swin-Large, a backbone with 197M params (saving 78%!).
This repository contains the AFF backbone and the point cloud-version of the Mask2Former segmentation head.
We also add a few convenient functionalities, such as visualizing prediction results on blurred version of the images, and evaluating on cocofied lvis v1 annotations.
ADE20K Semantic Segmentation (val)
backbone | method | pretrain | crop size | mIoU | FLOPs | checkpoint |
---|---|---|---|---|---|---|
AFF-Mini | Mask2Former | ImageNet-1K | 512x512 | 46.5 | 48.3G | Apple ML |
AFF-Mini-1/5 | Mask2Former | ImageNet-1K | 512x512 | 46.0 | 39.9G | Apple ML |
AFF-Tiny | Mask2Former | ImageNet-1K | 512x512 | 50.2 | 64.6G | Apple ML |
AFF-Tiny-1/5 | Mask2Former | ImageNet-1K | 512x512 | 50.0 | 51.1G | Apple ML |
AFF-Small | Mask2Former | ImageNet-1K | 512x512 | 51.2 | 87G | Apple ML |
AFF-Small-1/5 | Mask2Former | ImageNet-1K | 512x512 | 51.9 | 67.2G | Apple ML |
Cityscapes Instance Segmentation (val)
backbone | method | pretrain | AP | checkpoint |
---|---|---|---|---|
AFF-Mini | Mask2Former | ImageNet-1K | 40.0 | Apple ML |
AFF-Tiny | Mask2Former | ImageNet-1K | 42.7 | Apple ML |
AFF-Small | Mask2Former | ImageNet-1K | 44.0 | Apple ML |
AFF-Base | Mask2Former | ImageNet-22K | 46.2 | Apple ML |
Cityscapes Panoptic Segmentation (val)
backbone | method | pretrain | PQ(s.s.) | checkpoint |
---|---|---|---|---|
AFF-Mini | Mask2Former | ImageNet-1K | 62.7 | Apple ML |
AFF-Tiny | Mask2Former | ImageNet-1K | 65.7 | Apple ML |
AFF-Small | Mask2Former | ImageNet-1K | 66.9 | Apple ML |
AFF-Base | Mask2Former | ImageNet-22K | 67.7 | Apple ML |
COCO Instance Segmentation (val)
backbone | method | pretrain | epochs | AP | FLOPs | checkpoint |
---|---|---|---|---|---|---|
AFF-Mini | Mask2Former | ImageNet-1K | 50 | 42.3 | 148G | Apple ML |
AFF-Mini-1/5 | Mask2Former | ImageNet-1K | 50 | 42.3 | 120G | Apple ML |
AFF-Tiny | Mask2Former | ImageNet-1K | 50 | 45.3 | 204G | Apple ML |
AFF-Tiny-1/5 | Mask2Former | ImageNet-1K | 50 | 44.5 | 152G | Apple ML |
AFF-Small | Mask2Former | ImageNet-1K | 50 | 46.4 | 281G | Apple ML |
AFF-Small-1/5 | Mask2Former | ImageNet-1K | 50 | 45.7 | 206G | Apple ML |
git clone git@github.com:apple/ml-autofocusformer-segmentation.git
cd ml-autofocusformer-segmentation
One can download the pre-trained checkpoints through the links in the tables above.
sh create_env.sh
See further documentation inside the script file.
Our experiments are run with CUDA==11.6
and pytorch==1.12
.
Please refer to dataset README.
Use tools/convert-pretrained-model-to-d2.py
to convert any torch checkpoint .pth
file trained on ImageNet into a Detectron2 model zoo format .pkl
file.
python tools/convert-pretrained-model-to-d2.py aff_mini.pth aff_mini.pkl
Otherwise, d2 will assume the checkpoint is for the entire segmentation model and will not add backbone.
to the parameter names, and thus the checkpoint will not be properly loaded.
Modify the arguments in script run_aff_segmentation.sh
and run
sh run_aff_segmentation.sh
for training or evaluation.
One can also directly modify the config files in configs/
.
See script run_demo.sh
. More details can be found in Mask2Former GETTING_STARTED.md.
See tools README.
@inproceedings{autofocusformer,
title = {AutoFocusFormer: Image Segmentation off the Grid},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
author = {Ziwen, Chen and Patnaik, Kaushik and Zhai, Shuangfei and Wan, Alvin and Ren, Zhile and Schwing, Alex and Colburn, Alex and Fuxin, Li},
year = {2023},
}