Code implementation of my paper DA3D. The code is based on mmyolo.
conda create -n DA3D python=3.7
conda activate DA3D
Install the pytorch
# CUDA 11.6
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
# CUDA 11.7
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
Install custom libraries cv_ops
pip install -U openmim
mim install "mmengine==0.7.0"
mim install "mmcv==2.0.0rc4"
mim install "mmdet==3.0.0rc6"
mim install "mmdet3d==1.1.0rc3"
git clone https://github.com/jiayisong/DA3D.git
cd DA3D
# Install albumentations
mim install -r requirements/albu.txt
# Install MMYOLO
mim install -v -e .
# "-v" 指详细说明,或更多的输出
# "-e" 表示在可编辑模式下安装项目,因此对代码所做的任何本地修改都会生效,从而无需重新安装。
Download images from the kitti, including Download left color images of object data set (12 GB) and Download right color images, if you want to use stereo information (12 GB).
The labeled files need to be converted, and for convenience I uploaded the converted files directly. They are kitti_infos_test.pkl, kitti_infos_train.pkl, kitti_infos_trainval.pkl, and kitti_infos_val.pkl.
Unzip the image file and organize it and the label file as follows.
kitti
├── testing
│ ├── image_2
| | ├──000000.png
| | ├──000001.png
| | ├──''''
│ ├── image_3
| | ├──000000.png
| | ├──000001.png
| | ├──''''
├── training
│ ├── image_2
| | ├──000000.png
| | ├──000001.png
| | ├──''''
│ ├── image_3
| | ├──000000.png
| | ├──000001.png
| | ├──''''
├── kitti_infos_test.pkl
├── kitti_infos_train.pkl
├── kitti_infos_trainval.pkl
├── kitti_infos_val.pkl
Modify the configuration file appropriately based on the dataset location.
Due to the presence of the PPP module, it is necessary to change the input channel of the convolution kernel in the first layer to 4. For the simplicity of the code, we directly give the modified pre-trained model weights. They are cspnext-s, dla-34, and v2-99. Note that you have to specify the location of the pre-trained model weights in the configuration file. Or put it in the following location without modifying the configuration file.
DA3D
├── model_weight
│ ├── dla34-ba72cf86-base_layer_channel-4.pth
│ ├── cspnext-s_imagenet_600e_channel-4.pth
│ ├── depth_pretrained_v99_channel-4.pth
├── configs
├── ...
Similar to mmyolo, train with the following command. The batchsize used for the method in the paper is 8. When training with multiple gpu, pay attention to adjusting the size of batchsize in the configuration file.
# Single gpu
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/rtmdet/det3d/TableV_line1.py
# Multi gpu
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh configs/rtmdet/det3d/TableV_line1.py 4
Similar to mmyolo, test with the following command.
CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/rtmdet/TabelV_line1.py work_dirs/TabelV_line1/epoch_125.pth
When the test is complete, a number of txt files of the results are generated in work_dir/result. Then compressed into a zip it can be uploaded to the official kitti server.
The model I trained is given here. The following table is the same as Table V in the paper and the evaluation metrics are IOU=0.7, R40, AP_3D/AP_BEV on the validation set.
Network | Loss | DA | Easy | Mod. | Hard | Config | Download |
---|---|---|---|---|---|---|---|
RTM | SMOKE | 8.57 / 11.65 | 7.89 / 10.94 | 7.00 / 9.88 | config | model | log | |
RTM | SMOKE | ✓ | 16.40 / 21.29 | 13.32 / 17.34 | 11.36 / 15.00 | config | model | log |
RTM | MonoFlex | 14.38 / 18.90 | 11.27 / 15.07 | 9.65 / 12.98 | config | model | log | |
RTM | MonoFlex | ✓ | 21.79 / 25.95 | 17.04 / 20.86 | 14.87 / 18.23 | config | model | log |
DLA | MonoFlex | 20.90 / 26.61 | 16.29 / 20.99 | 14.46 / 18.71 | config | model | log | |
DLA | MonoFlex | ✓ | 25.66 / 31.56 | 21.68 / 26.73 | 19.27 / 23.80 | config | model | log |
The following table is the same as Table VI in the paper and the evaluation metrics are IOU=0.7, R40, AP_3D/AP_BEV on the test set through the official server.
Method | Easy | Mod. | Hard | Time | GPU | Config | Download |
---|---|---|---|---|---|---|---|
DA3D | 27.76/36.83 | 20.47/26.92 | 17.89/23.41 | 22 | 2080Ti | config | model | log |
DA3D* | 30.83/39.50 | 22.08/28.71 | 19.20/25.20 | 22 | 2080Ti | config | model | log |
DA3D** | 34.72/44.27 | 26.80/34.88 | 23.05/30.29 | 120 | 2080Ti | config | model | log |
If you find this project useful in your research, please consider citing:
@ARTICLE{10497146,
author={Jia, Yisong and Wang, Jue and Pan, Huihui and Sun, Weichao},
journal={IEEE Transactions on Instrumentation and Measurement},
title={Enhancing Monocular 3-D Object Detection Through Data Augmentation Strategies},
year={2024},
volume={73},
number={},
pages={1-11},
keywords={Three-dimensional displays;Object detection;Data augmentation;Task analysis;Pipelines;Cameras;Detectors;Autonomous driving;data augmentation;deep learning;monocular 3-D object detection},
doi={10.1109/TIM.2024.3387500}
}