/MRT-release

Official implementation of paper: Masked Retraining Teacher-student Framework for domain adaptive object detection.

Primary LanguagePython

MRT

Official implementation for paper: Masked Retraining Teacher-student Framework for Domain Adaptive Object Detection

By Zijing Zhao, Sitong Wei, Qingchao Chen, Dehui Li, Yifan Yang, Yuxin Peng and Yang Liu

The paper has been accepted by IEEE/CVF International Conference on Computer Vision (ICCV), 2023

method

Domain adaptive Object Detection (DAOD) leverages a labeled domain (source) to learn an object detector generalizing to a novel domain without annotation (target). Recent advances use a teacher-student framework, i.e., a student model is supervised by the pseudo labels from a teacher model. Though great success, they suffer from the limited number of pseudo boxes with incorrect predictions caused by the domain shift, misleading the student model to get sub-optimal results. To mitigate this problem, we propose Masked Retraining Teacher-student framework (MRT) which leverages masked autoencoder and selective retraining mechanism on detection transformer. Experiments show that our method outperforms existing approaches and achieves state-of-the-art on three domain adaptive object detection benchmarks.

We use Deformable DETR as the base detector. This code is built upon the original repository: https://github.com/fundamentalvision/Deformable-DETR, we thank for their excellent work.

1. Installation

1.1 Requirements

  • Linux, CUDA >= 11.1, GCC >= 8.4

  • Python >= 3.8

  • torch >= 1.10.1, torchvision >= 0.11.2

  • Other requirements

    pip install -r requirements.txt

1.2 Compiling Deformable DETR CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

2. Usage

2.1 Data preparation

We provide the 2 benchmarks in our paper:

  • city2foggy: cityscapes dataset is used as source domain, and foggy_cityscapes(0.02) is used as target domain.
  • sim2city: sim10k dataset is used as source domain, and cityscapes which only record AP of cars is used as target domain.
  • city2bdd: cityscapes dataset is used as source domain, and bdd100k-daytime is used as target domain.

You can download the raw data from the official websites: cityscapes, foggy_cityscapes, sim10k, bdd100k. We provide the annotations that are converted into coco style, download from here and organize the datasets and annotations as following:

[data_root]
└─ cityscapes
	└─ annotations
		└─ cityscapes_train_cocostyle.json
		└─ cityscapes_train_caronly_cocostyle.json
		└─ cityscapes_val_cocostyle.json
		└─ cityscapes_val_caronly_cocostyle.json
	└─ leftImg8bit
		└─ train
		└─ val
└─ foggy_cityscapes
	└─ annotations
		└─ foggy_cityscapes_train_cocostyle.json
		└─ foggy_cityscapes_val_cocostyle.json
	└─ leftImg8bit_foggy
		└─ train
		└─ val
└─ sim10k
	└─ annotations
		└─ sim10k_train_cocostyle.json
		└─ sim10k_val_cocostyle.json
	└─ JPEGImages
└─ bdd10k
	└─ annotations
		└─ bdd100k_daytime_train_cocostyle.json
		└─ bdd100k_daytime_val_cocostyle.json
	└─ JPEGImages

To use additional datasets, you can edit datasets/coco_style_dataset.py and add key-value pairs to CocoStyleDataset.img_dirs and CocoStyleDataset.anno_files .

2.2 Training and evaluation

As has been discussed in implementation details in the paper, to save computation cost, our method is designed as a three-stage paradigm. We first perform source_only training which is trained standardly by labeled source domain. Then, we perform cross_domain_mae to train the model with MAE branch. Finally, we perform teaching which utilize a teacher-student framework with MAE branch and selective retraining.

For example, for city2foggy benchmark, first edit the files in configs/def-detr-base/city2foggy/ to specify your own DATA_ROOT and OUTPUT_DIR, then run:

sh configs/def-detr-base/city2foggy/source_only.sh
sh configs/def-detr-base/city2foggy/cross_domain_mae.sh
sh configs/def-detr-base/city2foggy/teaching.sh

We use tensorboard to record the loss and results. Run the following command to see the curves during training:

tensorboard --logdir=<YOUR/LOG/DIR>

To evaluate the trained model and get the predicted results, run:

sh configs/def-detr-base/city2foggy/evaluation.sh

3. Results and Model Parameters

We conduct all experiments with batch size 8 (for source_only stage, 8 labeled samples; for cross_domain_mae and MRT teaching stage, 8 labeled samples and 8 unlabeled samples), on 2 NVIDIA A100 GPUs.

city2foggy: cityscapes → foggy cityscapes(0.02)

backbone encoder layers decoder layers training stage AP@50 logs & weights
resnet50 6 6 source_only 29.5 logs & weights
resnet50 6 6 cross_domain_mae 35.8 logs & weights
resnet50 6 6 MRT teaching 51.2 logs & weights

sim2city: sim10k → cityscapes(car only)

backbone encoder layers decoder layers training stage AP@50 logs & weights
resnet50 6 6 source_only 53.2 logs & weights
resnet50 6 6 cross_domain_mae 57.1 logs & weights
resnet50 6 6 MRT teaching 62.0 logs & weights

city2bdd: cityscapes → bdd100k(daytime)

backbone encoder layers decoder layers training stage AP@50 logs & weights
resnet50 6 6 source_only 29.6 logs & weights
resnet50 6 6 cross_domain_mae 31.1 logs & weights
resnet50 6 6 MRT teaching 33.7 logs & weights

4. Citation

This repository is constructed and maintained by Zijing Zhao and Sitong Wei.

If you find our paper or project useful, please cite our work by the following BibTeX:

@inproceedings{zhao2023masked,
  title={Masked retraining teacher-student framework for domain adaptive object detection},
  author={Zhao, Zijing and Wei, Sitong and Chen, Qingchao and Li, Dehui and Yang, Yifan and Peng, Yuxin and Liu, Yang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={19039--19049},
  year={2023}
}

Thanks for your attention.