A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View
The bird's-eye-view (BEV) representation allows robust learning of multiple tasks for autonomous driving including road layout estimation and 3D object detection. However, contemporary methods for unified road layout estimation and 3D object detection rarely handle the class imbalance of the training dataset and multi-class learning to reduce the total number of networks required. To overcome these limitations, we propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework. The proposed model deals with the performance degradation due to the class imbalance of the dataset utilizing the focal loss and the proposed dual cycle loss. Moreover, we set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations. To verify the effectiveness of the proposed model and the learning scheme, we conduct a thorough ablation study and a comparative study. The experiment results attest the effectiveness of our model; we achieve state-of-the-art performance in both the road layout estimation and 3D object detection tasks.
- DCT Architecture: We propose the dual-cycled crossview transformer (DCT) network for unified road layout estimation and 3D object detection for autonomous driving along with the learning scheme to handle the class imbalance.
- Multi-Class Learning: We investigate the effect of multi-class learning in the context of road layout estimation for the first time to the best of our knowledge.
- Ablation Study: We conduct a thorough ablation study and reveal important intuitions for the effect of each design choice.
- SoTA Performance: We achieve state-of-the-art performance on both road layout estimation and 3D object detection in the Argoverse and KITTI 3D Object datasets, respectively.
DCTNet/
├── crossView # Contains scripts for dataloaders and network/model architecture
└── datasets # Contains datasets
├── argoverse # argoverse dataset
├── kitti # kitti dataset
├── log # Contains a log of network/model
├── losses # Contains scripts for loss of network/model
├── models # Contains the saved model of the network/model
├── output # Contains output of network/model
└── splits
├── 3Dobject # Training and testing splits for KITTI 3DObject Detection dataset
├── argo # Training and testing splits for Argoverse Tracking v1.0 dataset
├── odometry # Training and testing splits for KITTI Odometry dataset
└── raw # Training and testing splits for KITTI RAW dataset(based on Schulter et. al.)
Our code was tested in virtual environment with Python 3.7, Pytorch 1.7.1, torchvision 0.8.2 and installing all the dependencies listed in the requirements file.
git clone https://github.com/AutoCompSysLab/DCTNet
cd DCTNet
pip install -r requirements.txt
In the paper, we've presented results for KITTI 3D Object, KITTI Odometry, KITTI RAW, and Argoverse 3D Tracking v1.0 datasets. For comparison with Schulter et. al., We've used the same training and test splits sequences from the KITTI RAW dataset. For more details about the training/testing splits one can look at the splits
directory. And you can download Ground-truth from Monolayout. If the link of the road label in Monolayout is invalid, please try these links offered by JPerciever: KITTI RAW and KITTI Odometry.
# Download KITTI RAW
./data/download_datasets.sh raw
# Download KITTI 3D Object
./data/download_datasets.sh object
# Download KITTI Odometry
./data/download_datasets.sh odometry
# Download Argoverse Tracking v1.0
./data/download_datasets.sh argoverse
The above scripts will download, unzip and store the respective datasets in the datasets
directory.
datasets/
└── argoverse # argoverse dataset
└── argoverse-tracking
└── train1
└── 1d676737-4110-3f7e-bec0-0c90f74c248f
├── car_bev_gt # Vehicle GT
├── road_gt # Road GT
├── stereo_front_left # RGB image
└── kitti # kitti dataset
└── object # kitti 3D Object dataset
└── training
├── image_2 # RGB image
├── vehicle_256 # Vehicle GT
├── odometry # kitti odometry dataset
└── 00
├── image_2 # RGB image
├── road_dense128 # Road GT
├── raw # kitti raw dataset
└── 2011_09_26
└── 2011_09_26_drive_0001_sync
├── image_2 # RGB image
├── road_dense128 # Road GT
- Prepare the corresponding dataset
- Run training
# Road (KITTI Odometry)
python3 train.py --type static --split odometry --data_path ./datasets/odometry/ --model_name <Model Name with specifications>
# Vehicle (KITTI 3D Object)
python3 train.py --type dynamic --split 3Dobject --data_path ./datasets/kitti/object/training --model_name <Model Name with specifications>
# Road (KITTI RAW)
python3 train.py --type static --split raw --data_path ./datasets/kitti/raw/ --model_name <Model Name with specifications>
# Vehicle (Argoverse Tracking v1.0)
python3 train.py --type dynamic --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>
# Road (Argoverse Tracking v1.0)
python3 train.py --type static --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>
# Vehicle and Road (Argoverse Tracking v1.0)
python3 train.py --type both --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications> --lr_steps 100 --num_class 3
- The training model are in "models" (default: ./models)
- Prepare the corresponding dataset
- Download pre-trained models
- Run evaluation
- The results are in "output" (default: ./output)
# Evaluate on KITTI Odometry
python3 eval.py --type static --split odometry --pretrained_path <path to the model directory> --data_path ./datasets/odometry --out_dir <path to the output directory>
# Evaluate on KITTI 3D Object
python3 eval.py --type dynamic --split 3Dobject --pretrained_path <path to the model directory> --data_path ./datasets/kitti/object/training --out_dir <path to the output directory>
# Evaluate on KITTI RAW
python3 eval.py --type static --split raw --pretrained_path <path to the model directory> --data_path ./datasets/kitti/raw/ --out_dir <path to the output directory>
# Evaluate on Argoverse Tracking v1.0 (Road)
python3 eval.py --type static --split argo --pretrained_path <path to the model directory> --data_path ./datasets/kitti/argoverse/ --out_dir <path to the output directory>
# Evaluate on Argoverse Tracking v1.0 (Vehicle)
python3 eval.py --type dynamic --split argo --pretrained_path <path to the model directory> --data_path ./datasets/kitti/argoverse --out_dir <path to the output directory>
# Evaluate on Argoverse Tracking v1.0 (Vehicle and Road)
python3 eval.py --type both --split argo --pretrained_path <path to the model directory> --data_path ./datasets/kitti/argoverse --out_dir <path to the output directory> --num_class 3
- The results are in "output" (default: ./output)
The following table provides links to the pre-trained models for each dataset mentioned in our paper. The table also shows the corresponding evaluation results for these models.
Dataset | Segmentation Objects | mIOU(%) | mAP(%) | Pretrained Model |
---|---|---|---|---|
KITTI 3D Object | Vehicle | 39.44 | 58.89 | link |
KITTI Odometry | Road | 77.15 | 88.28 | link |
KITTI Raw | Road | 65.86 | 86.56 | link |
Argoverse Tracking | Vehicle | 48.04 | 68.96 | link |
Argoverse Tracking | Road | 76.71 | 88.87 | link |
Dataset | Segmentation Objects | mIOU(%) | mAP(%) | Pretrained Model |
---|---|---|---|---|
Argoverse Tracking | Vehicle | 31.75 | 46.20 | link for both |
Argoverse Tracking | Road | 74.73 | 86.76 |
If you meet any problems, please describe them in issues or contact:
- Curie Kim: curie3170@gmail.com
Thanks for the open-source related works. This project partially depends on the sources of Monolayout, PYVA, and JPerciever.