/BoundaryFormer

Code for CVPR2022 paper: Instance Segmentation with Mask-supervised Polygonal Boundary Transformers

Primary LanguagePythonApache License 2.0Apache-2.0

Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers

From Justin Lazarow (UCSD, now at Apple), Weijian Xu (UCSD, now at Microsoft), and Zhuowen Tu (UCSD).

This repository is an official implementation of the paper Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers presented at CVPR 2022.

Introduction

BoundaryFormer aims to provide a simple baseline for regression-based instance segmentation. Notably, we use Transformers to regress a fixed number of points along a simple polygonal boundary. This process makes continuous predictions and is thus end-to-end differentiable. Our method differs from previous work in the field in two main ways: our method can match Mask R-CNN in Mask AP for the first time and we impose no additional supervision or ground-truth requirements as Mask R-CNN. That is, our method achieves parity in mask quality and supervision to mask-based baselines. We accomplish this by solely relying on a differentiable rasterization module (implemented in CUDA) which only requires access to ground-truth masks. We hope this can serve to drive further work in this area.

Installation

BoundaryFormer uses the same installation process as Detectron2. Please see installation instructions. This should generally require something like:

pip install -ve .

at the root of the source tree (as long as PyTorch, etc are installed correctly.

BoundaryFormer also uses the deformable attention modules introduced in Deformable-DETR. If this is already installed on your system, no action is needed. Otherwise, please build their modules:

git clone https://github.com/fundamentalvision/Deformable-DETR
cd Deformable-DETR/models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Getting Started

BoundaryFormer follows the general guidelines of Detectron2, however, it lives under projects/BoundaryFormer.

Please make sure to set two additional environmental variables on your system:

export DETECTRON2_DATASETS=/path/to/datasets
export DETECTRON2_OUTPUTS=/path/to/outputs

For instance, to train on COCO using an R50 backbone at a 1x schedule:

python projects/BoundaryFormer/train_net.py --num-gpus 8 --config-file projects/BoundaryFormer/configs/COCO-InstanceSegmentation/boundaryformer_rcnn_R_50_FPN_1x.yaml COMMENT "hello model"

If you do not have 8 GPUs, adjust --num-gpus and your BATCH_SIZE accordingly. BoundaryFormer is trained with AdamW and we find the square-root scaling law to work well (i.e., a batch size of 8 should only induce a sqrt(2) change in LR).

Relevant Hyperparameters/Configuration Options

BoundaryFormer has a few hyperparameter options. Generally, these are configured under cfg.MODEL.BOUNDARY_HEAD (see projects/BoundaryFormer/boundary_former/config.py). Please see the paper for ablations of these values.

Number of layers

cfg.MODEL.BOUNDARY_HEAD.NUM_DEC_LAYERS = 4

We generally find that 4 layers is sufficient for good performance. A small amount of performance is lost by reducing this to 3 and otherwise increasing it doesn't generally change performance.

NOTE: if upsampling is used, this is generally ignored and computed by a combination of cfg.MODEL.BOUNDARY_HEAD.POLY_NUM_PTS and cfg.MODEL.BOUNDARY_HEAD.UPSAMPLING_BASE_NUM_PTS.

Number of control points

cfg.MODEL.BOUNDARY_HEAD.POLY_NUM_PTS = 64

This defines the number of points at the final output layer. If upsampling (see next section) is not used, this also constitutes the number of points at any intermediate layer. Generally, we find Cityscapes to benefit from more than 64 points (e.g. 128) but COCO less so.

Upsampling behavior

Upsampling constitutes our coarse-to-fine strategy which can reduce memory and computation. Rather than using the same number of points at each layer, we start off with a small number of points and upsample (2x) the points in a naive manner (midpoints) at each subsequent layer. To enable:

cfg.MODEL.BOUNDARY_HEAD.UPSAMPLING = True
cfg.MODEL.BOUNDARY_HEAD.UPSAMPLING_BASE_NUM_PTS = 8
cfg.MODEL.BOUNDARY_HEAD.POLY_NUM_PTS = 64

This will create a 4-layer (8 * 2 ** 3 = 64) coarse-to-fine model

Rasterization resolution

BoundaryFormer uses differentiable rasterization to transform the predicted polygons into mask space for supervision. To control the resolution:

cfg.MODEL.DIFFRAS.RESOLUTIONS = [64, 64]

is a flattened (e.g. for X and Y resolutions) list. This can be modified per layer by expanding it. For a two-layer model:

cfg.MODEL.DIFFRAS.RESOLUTIONS = [32, 32, 64, 64]

would supervise the first layer at 32 x 32 and the second at 64 x 64.

Rasterization smoothness

In the same way as SoftRas, we require some rasterization smoothness to differentiably rasterize the masks.

    cfg.MODEL.DIFFRAS.INV_SMOOTHNESS_SCHED = (0.001,)

will produce quite sharp rasterization (larger values will be "blurrier") which seems to work well. This can also be made to be dependent on the current iteration:

    cfg.MODEL.DIFFRAS.INV_SMOOTHNESS_SCHED = (0.15, 0.005)
    cfg.MODEL.DIFFRAS.INV_SMOOTHNESS_STEPS = (50000,)

to initially start with 0.15 and drop to 0.005 at iteration 50000. This hyperparameter is not particularly sensitive in our experience, however, too large of values will decrease performance.

Model Zoo

We release models for MS-COCO and Cityscapes.

COCO

Mask
head
Backbone lr
sched
Control
points
mask
AP
download
BoundaryFormer R50-FPN 64 36.1 model

Cityscapes

Mask
head
Backbone lr
sched
Control
points
initialization mask
AP
download
BoundaryFormer R50-FPN 64 ImageNet 34.7 model
BoundaryFormer R50-FPN 64 COCO 38.3 model

License

BoundaryFormer uses Detectron2 and is further released under the Apache 2.0 license.

Citing BoundaryFormer

If you use BoundaryFormer in your research, please use the following BibTeX entry.

@InProceedings{Lazarow_2022_CVPR,
    author    = {Lazarow, Justin and Xu, Weijian and Tu, Zhuowen},
    title     = {Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4382-4391}
}