PETDet: Proposal Enhancement for Two-Stage Fine-Grained Object Detection

Official implement for PETDet (under review).

The second place winning solution (2/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2021 Gaofen Challenge on Automated High-Resolution Earth Observation Image Interpretation.

Introduction

Fine-grained object detection (FGOD) extends object detection with the capability of fine-grained recognition. In recent two-stage FGOD methods, the region proposal serves as a crucial link between detection and fine-grained recognition. However, current methods overlook that some proposal-related procedures inherited from general detection are not equally suitable for FGOD, limiting the multi-task learning from generation, representation, to utilization. In this paper, we present PETDet (Proposal Enhancement for Two-stage fine-grained object detection) to properly handle the sub-tasks in two-stage FGOD methods. Firstly, an anchor-free Quality Oriented Proposal Network (QOPN) is proposed with dynamic label assignment and attention-based decomposition to generate high-quality oriented proposals. Additionally, we present a Bilinear Channel Fusion Network (BCFN) to extract independent and discriminative features from the proposals. Furthermore, we design a novel Adaptive Recognition Loss (ARL) which offers guidance for the R-CNN head to focus on high-quality proposals. Extensive experiments validate the effectiveness of PETDet. Quantitative analysis reveals that PETDet with ResNet50 reaches state-of-the-art performance on various FGOD datasets, including FAIR1M-v1.0 (42.96 AP), FAIR1M-v2.0 (48.81 AP), MAR20 (85.91 AP) and ShipRSImageNet (74.90 AP). The proposed method also achieves superior compatibility between accuracy and inference speed. Our code and models will be released at https://github.com/canoe-Z/PETDet.

Results and Models

FAIR1M-v2.0

Method	Backbone	Angle	lr schd	Aug	Batch Size	AP₅₀	Download
Faster R-CNN	ResNet50 (1024,1024,200)	le90	1x	-	2*4	41.64	model \| log \| submission
RoI Transformer	ResNet50 (1024,1024,200)	le90	1x	-	2*4	44.03	model \| log \| submission
Oriented R-CNN	ResNet50 (1024,1024,200)	le90	1x	-	2*4	43.90	model \| log \| submission
ReDet	ReResNet50 (1024,1024,200)	le90	1x	-	2*4	46.03	model \| log \| submission
PETDet	ResNet50 (1024,1024,200)	le90	1x	-	2*4	48.81	model \| log \| submission

MAR20

Method	Backbone	Angle	lr schd	Aug	Batch Size	AP₅₀	mAP	Download
Faster R-CNN	ResNet50 (800,800)	le90	3x	-	2*4	75.01	47.57	model \| log
RoI Transformer	ResNet50 (800,800)	le90	3x	-	2*4	82.46	56.43	model \| log
Oriented R-CNN	ResNet50 (800,800)	le90	3x	-	2*4	82.71	58.14	model \| log
PETDet	ResNet50 (800,800)	le90	3x	-	2*4	85.91	61.48	model \| log

ShipRSImageNet

Method	Backbone	Angle	lr schd	Aug	Batch Size	AP₅₀	mAP	Download
Faster R-CNN	ResNet50 (1024,1024)	le90	3x	-	2*4	54.75	27.60	model \| log
RoI Transformer	ResNet50 (1024,1024)	le90	3x	-	2*4	60.98	33.56	model \| log
Oriented R-CNN	ResNet50 (1024,1024)	le90	3x	-	2*4	71.76	51.90	model \| log
PETDet	ResNet50 (1024,1024)	le90	3x	-	2*4	74.90	55.69	model \| log

Installation

This repo is based on mmrotate 0.x and OBBDetection.

Step 1. Create a conda environment and activate it.

conda create --name petdet python=3.10 -y
conda activate petdet

Step 2. Install PyTorch following official instructions. Pytorch 1.13.1 is recommend.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

Step 3. Install MMCV 1.x and MMDetection 2.x using MIM.

pip install -U openmim
mim install mmcv-full==1.7.1
mim install mmdet==2.28.2

Step 4. Install PETDet from source.

git clone https://github.com/canoe-Z/PETDet.git
cd PETDet
pip install -v -e .

Data Preparation

Download datassts:

For FAIR1M, Please crop the original images into 1024×1024 patches with an overlap of 200 by run the split tool.

The data structure is as follows:

PETDet
├── mmrotate
├── tools
├── configs
├── data
|   ├── FAIR1M1_0
│   │   ├── train
│   │   ├── test
│   ├── FAIR1M2_0
│   │   ├── train
│   │   ├── val
│   │   ├── test
│   ├── MAR20
│   │   ├── Annotations
│   │   ├── ImageSets
│   │   ├── JPEGImages
│   ├── ShipRSImageNet
│   │   ├── COCO_Format
│   │   ├── VOC_Format

Inference

Assuming you have put the splited FAIR1M dataset into data/split_ss_fair1m2_0/ and have downloaded the models into the weights/, you can now evaluate the models on the FAIR1M_V2.0 test split:

./tools/dist_test.sh configs/petdet/ \
  petdet_r50_fpn_1x_fair1m_le90.py \
  weights/petdet_r50_fpn_1x_fair1m_le90.pth 4 --format-only \
  --eval-options submission_dir=work_dirs/FAIR1M_2.0_results

Then, you can upload work_dirs/FAIR1M_2.0_results/submission_zip/test.zip to ISPRS Benchmark.

Training

The following command line will train petdet_r50_fpn_1x_fair1m_le90 on 4 GPUs:

./dist_train.sh configs/petdet/petdet_r50_fpn_1x_fair1m_le90.py 4

Notes:

The models will be saved into work_dirs/petdet_r50_fpn_1x_fair1m_le90.
If you use a different mini-batch size, please change the learning rate according to the Linear Scaling Rule.
We use 4 RTX3090 GPUs for the training of these models with a mini-batch size of 8 images (2 images per GPU). However, we found that training with a smaller batchsize may yield slightly better results on the FGOD tasks.

ChenSJ1011/PETDet