/LW-DETR

This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".

Primary LanguagePythonApache License 2.0Apache-2.0

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

This is the official repository with PyTorch implementation of LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection.

☀️ If you find this work useful for your research, please kindly star our repo and cite our paper! ☀️

Highlights

  • Release a series of real-time detection models in LW-DETR, including LW-DETR-tiny, LW-DETR-small, LW-DETR-medium, LW-DETR-large and LW-DETR-xlarge, named <LWDETR_*size_60e_coco.pth>. Please refer to Hugging Face to download.
  • Release a series of pretrained models in LW-DETR. Please refer to Hugging Face to download.

News

[2024/7/15] We present OVLW-DETR, an efficient open-vocabulary detector with outstanding performance and low latency, built upon LW-DETR. It surpasses existing real-time open-vocabulary detectors on the standard Zero-Shot LVIS benchmark. The source code and pre-trained model is comming soon, please stay tuned!

Catalogue

1. Introduction

LW-DETR is a light-weight detection tranformer, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. LW-DETR leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for reducing the ViT encoder complexity. LW-DETR improves the ViT encoder by aggregating multi-level feature maps, and the intermediate and final feature maps in the ViT encoder, forming richer feature maps, and introduces window-major feature map organization for improving the efficiency of interleaved attention computation. LW-DETR achieves superior performance than on existing real-time detectors, e.g., YOLO and its variants, on COCO and other benchmark datasets.

2. Installation

Requirements

The code is developed and validated under python=3.8.19, pytorch=1.13.0, cuda=11.6,TensorRT-8.6.1.6. Higher versions might be available as well.

  1. Create your own Python environment with Anaconda.
conda create -n lwdetr python=3.8.19
conda activate lwdetr
  1. Clone this repo.
git clone https://github.com/Atten4Vis/LW-DETR.git
cd LW-DETR
  1. Install PyTorch and torchvision.

Follow the instruction on https://pytorch.org/get-started/locally/.

# an example:
conda install pytorch==1.13.0 torchvision==0.14.0 pytorch-cuda=11.6 -c pytorch -c nvidia
  1. Install required packages.

For training and evaluation:

pip install -r requirements.txt

For deployment:

Please refer to NVIDIA for installation instruction of TensorRT

pip install -r deploy/requirements.txt
  1. Compiling CUDA operators
cd models/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../..

3. Preparation

Data preparation

For MS COCO dataset, please download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

COCODIR/
  ├── train2017/
  ├── val2017/
  └── annotations/
  	├── instances_train2017.json
  	└── instances_val2017.json

For Objects365 dataset for pretraining, please download Objects365 images with annotations from https://www.objects365.org/overview.html.

Model preparation

All the checkpoints can be found in Hugging Face.

  1. Pretraining on Objects365.
  • Pretrained the ViT.

We pretrain the ViT on the dataset Objects365 using a MIM method, CAE v2, based on the pretrained models. Please refer to the following link to download the pretrained models, and put them into pretrain_weights/.

Model Comment
caev2_tiny_300e_objects365 pretrained ViT model on objects365 for LW-DETR-tiny/small using CAE v2
caev2_tiny_300e_objects365 pretrained ViT model on objects365 for LW-DETR-medium/large using CAE v2
caev2_tiny_300e_objects365 pretrained ViT model on objects365 for LW-DETR-xlarge using CAE v2
  • Pretrained LW-DETR.

We retrain the encoder and train the projector and the decoder on Objects365 in a supervision manner. Please refer to the following link to download the pretrained models, and put them into pretrain_weights/.

Model Comment
LWDETR_tiny_30e_objects365 pretrained LW-DETR-tiny model on objects365
LWDETR_small_30e_objects365 pretrained LW-DETR-small model on objects365
LWDETR_medium_30e_objects365 pretrained LW-DETR-medium model on objects365
LWDETR_large_30e_objects365 pretrained LW-DETR-large model on objects365
LWDETR_xlarge_30e_objects365 pretrained LW-DETR-xlarge model on objects365
  1. Finetuning on COCO. We finetune the pretrained model on COCO. If you want to reimplement our repo, please skip this step. If you want to directly evaluate our trained models, please refer to the following link to download the finetuned models, and put them into output/.
Model Comment
LWDETR_tiny_60e_coco finetuned LW-DETR-tiny model on COCO
LWDETR_small_60e_coco finetuned LW-DETR-small model on COCO
LWDETR_medium_60e_coco finetuned LW-DETR-medium model on COCO
LWDETR_large_60e_coco finetuned LW-DETR-large model on COCO
LWDETR_xlarge_60e_coco finetuned LW-DETR-xlarge model on COCO

4. Train

You can directly run scripts/lwdetr_<model_size>_coco_train.sh file for the training process on coco dataset.

Train a LW-DETR-tiny model
sh scripts/lwdetr_tiny_coco_train.sh /path/to/your/COCODIR
Train a LW-DETR-small model
sh scripts/lwdetr_small_coco_train.sh /path/to/your/COCODIR
Train a LW-DETR-medium model
sh scripts/lwdetr_medium_coco_train.sh /path/to/your/COCODIR
Train a LW-DETR-large model
sh scripts/lwdetr_large_coco_train.sh /path/to/your/COCODIR
Train a LW-DETR-xlarge model
sh scripts/lwdetr_xlarge_coco_train.sh /path/to/your/COCODIR

5. Eval

You can directly run scripts/lwdetr_<model_size>_coco_eval.sh file for the evaluation process on coco dataset. Please refer to 3. Preparation to download a series of LW-DETR models.

Eval our pretrained LW-DETR-tiny model
sh scripts/lwdetr_tiny_coco_eval.sh /path/to/your/COCODIR /path/to/your/checkpoint
Eval our pretrained LW-DETR-small model
sh scripts/lwdetr_small_coco_eval.sh /path/to/your/COCODIR /path/to/your/checkpoint
Eval our pretrained LW-DETR-medium model
sh scripts/lwdetr_medium_coco_eval.sh /path/to/your/COCODIR /path/to/your/checkpoint
Eval our pretrained LW-DETR-large model
sh scripts/lwdetr_large_coco_eval.sh /path/to/your/COCODIR /path/to/your/checkpoint
Eval our pretrained LW-DETR-xlarge model
sh scripts/lwdetr_xlarge_coco_eval.sh /path/to/your/COCODIR /path/to/your/checkpoint

6. Deploy

Export models

You can run scripts/lwdetr_<model_size>_coco_export.sh file to export models for development. Before execution, please ensure that TensorRT and cuDNN environment variables are correctly set.

Export a LW-DETR-tiny model
# export ONNX model
sh scripts/lwdetr_tiny_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint
# convert model from ONNX to TensorRT engine as well
sh scripts/lwdetr_tiny_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint --trt
Export a LW-DETR-small model
# export ONNX model
sh scripts/lwdetr_small_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint
# convert model from ONNX to TensorRT engine as well
sh scripts/lwdetr_small_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint --trt
Export a LW-DETR-medium model
# export ONNX model
sh scripts/lwdetr_medium_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint
# convert model from ONNX to TensorRT engine as well
sh scripts/lwdetr_medium_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint --trt
Export a LW-DETR-large model
# export ONNX model
sh scripts/lwdetr_large_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint
# convert model from ONNX to TensorRT engine as well
sh scripts/lwdetr_large_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint --trt
Export a LW-DETR-xlarge model
# export ONNX model
sh scripts/lwdetr_xlarge_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint
# convert model from ONNX to TensorRT engine as well
sh scripts/lwdetr_xlarge_coco_export.sh /path/to/your/COCODIR /path/to/your/checkpoint --trt

Run benchmark

You can use deploy/benchmark.py tool to run benchmarks of inference latency.

# evaluate and benchmark the latency on a onnx model
python deploy/benchmark.py --path=/path/to/your/onnxmodel --coco_path=/path/to/your/COCODIR --run_benchmark 
# evaluate and benchmark the latency on a TensorRT engine
python deploy/benchmark.py --path=/path/to/your/trtengine --coco_path=/path/to/your/COCODIR --run_benchmark 

7. Main Results

The main results on coco dataset. We report the mAP as reported in the original paper, as well as the mAP obtained from re-implementation.

Method
pretraining Params (M) FLOPs (G) Model Latency (ms) Total Latency (ms) mAP Download
LW-DETR-tiny 12.1 11.2 2.0 2.0 42.6(42.9) Link
LW-DETR-small 14.6 16.6 2.9 2.9 48.0(48.1) Link
LW-DETR-medium 28.2 42.8 5.6 5.6 52.5(52.6) Link
LW-DETR-large 46.8 71.6 8.8 8.8 56.1(56.1) Link
LW-DETR-xlarge 118.0 174.2 19.1 19.1 58.3(58.3) Link

8. References

Our project is conducted based on the following public paper with code:

9. Citation

If you find this code useful in your research, please kindly consider citing our paper:

    @article{chen2024lw,
        title={LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection},
        author={Chen, Qiang and Su, Xiangbo and Zhang, Xinyu and Wang, Jian and Chen, Jiahui and Shen, Yunpeng and Han, Chuchu and Chen, Ziliang and Xu, Weixiang and Li, Fanrong and others},
        journal={arXiv preprint arXiv:2406.03459},
        year={2024}
    }