Adaptive Class Suppression Loss for Long-Tail Object Detection
This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection. [Paper]
Requirements
1. Environment:
The requirements are exactly the same as BalancedGroupSoftmax. We tested on the following settings:
- python 3.7
- cuda 10.0
- pytorch 1.2.0
- torchvision 0.4.0
- mmcv 0.2.14
conda create -n mmdet python=3.7 -y
conda activate mmdet
pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install matplotlib
pip install terminaltables
# download the source code of mmcv 0.2.14 from https://github.com/open-mmlab/mmcv/tree/v0.2.14
cd mmcv-0.2.14
pip install -v -e .
cd ../
git clone https://github.com/CASIA-IVA-Lab/ACSL.git
cd ACSL/lvis-api/
python setup.py develop
cd ../
python setup.py develop
2. Data:
a. For dataset images:
# Make sure you are in dir ACSL
mkdir data
cd data
mkdir lvis
mkdir pretrained_models
mkdir download_models
- If you already have COCO2017 dataset, it will be great. Link
train2017
andval2017
folders under folderlvis
. - If you do not have COCO2017 dataset, please download:
COCO train set and
COCO val set
and unzip these files and mv them under folder
lvis
.
b. For dataset annotations:
- Download lvis annotations: lvis_train_ann and lvis_val_ann.
- Unzip all the files and put them under
lvis
,
c. For pretrained models:
Download the corresponding pre-trained models below.
-
To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.
-
Move these model files to
./data/pretrained_models/
d. For download_models:
Download the trained baseline models and ACSL models from BaiduYun, code is 2jp3
-
To train ACSL models, we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer.
-
Move these model files to
./data/download_models/
After all these operations, the folder data
should be like this:
data
├── lvis
│ ├── lvis_v0.5_train.json
│ ├── lvis_v0.5_val.json
│ ├── train2017
│ │ ├── 000000100582.jpg
│ │ ├── 000000102411.jpg
│ │ ├── ......
│ └── val2017
│ ├── 000000062808.jpg
│ ├── 000000119038.jpg
│ ├── ......
└── pretrained_models
│ ├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
│ ├── ......
└── download_models
├── R50-baseline.pth
├── ......
Training
Note: Please make sure that you have prepared the pretrained_models and the download_models and they have been put to the path specified in
${CONIFG_FILE}
.
Use the following commands to train a model.
# Single GPU
python tools/train.py ${CONFIG_FILE}
# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
All config files are under ./configs/
.
./configs/baselines
: all baseline models../configs/acsl
: models for ACSL models.
For example, to train a ACSL model with Faster R-CNN R50-FPN:
# Single GPU
python tools/train.py configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py
# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py 8
Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)
Testing
Use the following commands to test a trained model.
# single gpu test
python tools/test_lvis.py \
${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
# multi-gpu testing
./tools/dist_test_lvis.sh \
${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
$RESULT_FILE
: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.$EVAL_METRICS
: Items to be evaluated on the results.bbox
for bounding box evaluation only.bbox segm
for bounding box and mask evaluation.
For example (assume that you have finished the training of ACSL models.):
- To evaluate the trained ACSL model with Faster R-CNN R50-FPN for object detection:
# single-gpu testing
python tools/test_lvis.py configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py \
./work_dirs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl/epoch_12.pth \
--out acsl_val_result.pkl --eval bbox
# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py \
./work_dirs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl/epoch_12.pth 8 \
--out acsl_val_result.pkl --eval bbox
Results and models
Please refer to our paper for more details.
Method | Models | bbox mAP | Config file | Pretrained Model | Model |
---|---|---|---|---|---|
baseline | R50-FPN | 21.18 | file | COCO-R50 | R50-baseline |
ACSL | R50-FPN | 26.36 | file | R50-baseline | R50-acsl |
baseline | R101-FPN | 22.36 | file | COCO-R101 | R101-baseline |
ACSL | R101-FPN | 27.49 | file | R101-baseline | R101-acsl |
baseline | X101-FPN | 24.70 | file | COCO-X101 | X101-baseline |
ACSL | X101-FPN | 28.93 | file | X101-baseline | X101-acsl |
baseline | Cascade-R101 | 25.14 | file | COCO-Cas-R101 | Cas-R101-baseline |
ACSL | Cascade-R101 | 29.71 | file | Cas-R101-baseline | Cas-R101-acsl |
baseline | Cascade-X101 | 27.14 | file | COCO-Cas-X101 | Cas-X101-baseline |
ACSL | Cascade-X101 | 31.47 | file | Cas-X101-baseline | Cas-X101-acsl |
Important: The code of BaiduYun is 2jp3
Citation
@inproceedings{wang2021adaptive,
title={Adaptive Class Suppression Loss for Long-Tail Object Detection},
author={Wang, Tong and Zhu, Yousong and Zhao, Chaoyang and Zeng, Wei and Wang, Jinqiao and Tang, Ming},
journal={CVPR},
year={2021}
}
Credit
This code is largely based on BalancedGroupSoftmax and mmdetection v1.0.rc0 and LVIS API.