/DCTNet

ArXiv (2022), 'A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View'

Primary LanguagePythonMIT LicenseMIT

A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View

Curie Kim, Ue-Hwan Kim

PWC PWC PWC PWC PWC

Abstract

The bird's-eye-view (BEV) representation allows robust learning of multiple tasks for autonomous driving including road layout estimation and 3D object detection. However, contemporary methods for unified road layout estimation and 3D object detection rarely handle the class imbalance of the training dataset and multi-class learning to reduce the total number of networks required. To overcome these limitations, we propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework. The proposed model deals with the performance degradation due to the class imbalance of the dataset utilizing the focal loss and the proposed dual cycle loss. Moreover, we set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations. To verify the effectiveness of the proposed model and the learning scheme, we conduct a thorough ablation study and a comparative study. The experiment results attest the effectiveness of our model; we achieve state-of-the-art performance in both the road layout estimation and 3D object detection tasks.

Contributions

  • DCT Architecture: We propose the dual-cycled crossview transformer (DCT) network for unified road layout estimation and 3D object detection for autonomous driving along with the learning scheme to handle the class imbalance.
  • Multi-Class Learning: We investigate the effect of multi-class learning in the context of road layout estimation for the first time to the best of our knowledge.
  • Ablation Study: We conduct a thorough ablation study and reveal important intuitions for the effect of each design choice.
  • SoTA Performance: We achieve state-of-the-art performance on both road layout estimation and 3D object detection in the Argoverse and KITTI 3D Object datasets, respectively.

Approach overview

Overview_final

Repository Structure

DCTNet/
├── crossView            # Contains scripts for dataloaders and network/model architecture
└── datasets             # Contains datasets
    ├── argoverse        # argoverse dataset
    ├── kitti            # kitti dataset 
├── log                  # Contains a log of network/model
├── losses               # Contains scripts for loss of network/model
├── models               # Contains the saved model of the network/model
├── output               # Contains output of network/model
└── splits
    ├── 3Dobject         # Training and testing splits for KITTI 3DObject Detection dataset 
    ├── argo             # Training and testing splits for Argoverse Tracking v1.0 dataset
    ├── odometry         # Training and testing splits for KITTI Odometry dataset
    └── raw              # Training and testing splits for KITTI RAW dataset(based on Schulter et. al.)

Installation

Our code was tested in virtual environment with Python 3.7, Pytorch 1.7.1, torchvision 0.8.2 and installing all the dependencies listed in the requirements file.

git clone https://github.com/AutoCompSysLab/DCTNet

cd DCTNet
pip install -r requirements.txt

Datasets

In the paper, we've presented results for KITTI 3D Object, KITTI Odometry, KITTI RAW, and Argoverse 3D Tracking v1.0 datasets. For comparison with Schulter et. al., We've used the same training and test splits sequences from the KITTI RAW dataset. For more details about the training/testing splits one can look at the splits directory. And you can download Ground-truth from Monolayout. If the link of the road label in Monolayout is invalid, please try these links offered by JPerciever: KITTI RAW and KITTI Odometry.

# Download KITTI RAW
./data/download_datasets.sh raw

# Download KITTI 3D Object
./data/download_datasets.sh object

# Download KITTI Odometry
./data/download_datasets.sh odometry

# Download Argoverse Tracking v1.0
./data/download_datasets.sh argoverse

The above scripts will download, unzip and store the respective datasets in the datasets directory.

datasets/
└── argoverse                          # argoverse dataset
    └── argoverse-tracking
        └── train1
            └── 1d676737-4110-3f7e-bec0-0c90f74c248f
                ├── car_bev_gt         # Vehicle GT
                ├── road_gt            # Road GT
                ├── stereo_front_left  # RGB image
└── kitti                              # kitti dataset 
    └── object                         # kitti 3D Object dataset 
        └── training
            ├── image_2                # RGB image
            ├── vehicle_256            # Vehicle GT
    ├── odometry                       # kitti odometry dataset 
        └── 00
            ├── image_2                # RGB image
            ├── road_dense128  # Road GT
    ├── raw                            # kitti raw dataset 
        └── 2011_09_26
            └── 2011_09_26_drive_0001_sync
                ├── image_2            # RGB image
                ├── road_dense128      # Road GT

Training

  1. Prepare the corresponding dataset
  2. Run training
# Road (KITTI Odometry)
python3 train.py --type static --split odometry --data_path ./datasets/odometry/ --model_name <Model Name with specifications>

# Vehicle (KITTI 3D Object)
python3 train.py --type dynamic --split 3Dobject --data_path ./datasets/kitti/object/training --model_name <Model Name with specifications>

# Road (KITTI RAW)
python3 train.py --type static --split raw --data_path ./datasets/kitti/raw/  --model_name <Model Name with specifications>

# Vehicle (Argoverse Tracking v1.0)
python3 train.py --type dynamic --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>

# Road (Argoverse Tracking v1.0)
python3 train.py --type static --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>

# Vehicle and Road (Argoverse Tracking v1.0)
python3 train.py --type both --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications> --lr_steps 100  --num_class 3
  1. The training model are in "models" (default: ./models)

Evaluation and Saving predictions

  1. Prepare the corresponding dataset
  2. Download pre-trained models
  3. Run evaluation
  4. The results are in "output" (default: ./output)
# Evaluate on KITTI Odometry 
python3 eval.py --type static --split odometry --pretrained_path <path to the model directory> --data_path ./datasets/odometry --out_dir <path to the output directory> 

# Evaluate on KITTI 3D Object
python3 eval.py --type dynamic --split 3Dobject --pretrained_path <path to the model directory> --data_path ./datasets/kitti/object/training --out_dir <path to the output directory> 

# Evaluate on KITTI RAW
python3 eval.py --type static --split raw --pretrained_path <path to the model directory> --data_path ./datasets/kitti/raw/ --out_dir <path to the output directory> 

# Evaluate on Argoverse Tracking v1.0 (Road)
python3 eval.py --type static --split argo --pretrained_path <path to the model directory> --data_path ./datasets/kitti/argoverse/ --out_dir <path to the output directory> 

# Evaluate on Argoverse Tracking v1.0 (Vehicle)
python3 eval.py --type dynamic --split argo --pretrained_path <path to the model directory> --data_path ./datasets/kitti/argoverse --out_dir <path to the output directory> 

# Evaluate on Argoverse Tracking v1.0 (Vehicle and Road)
python3 eval.py --type both --split argo --pretrained_path <path to the model directory> --data_path ./datasets/kitti/argoverse --out_dir <path to the output directory>  --num_class 3
  1. The results are in "output" (default: ./output)

Pretrained Models

The following table provides links to the pre-trained models for each dataset mentioned in our paper. The table also shows the corresponding evaluation results for these models.

Single class learning

Dataset Segmentation Objects mIOU(%) mAP(%) Pretrained Model
KITTI 3D Object Vehicle 39.44 58.89 link
KITTI Odometry Road 77.15 88.28 link
KITTI Raw Road 65.86 86.56 link
Argoverse Tracking Vehicle 48.04 68.96 link
Argoverse Tracking Road 76.71 88.87 link

Multi class learning

Dataset Segmentation Objects mIOU(%) mAP(%) Pretrained Model
Argoverse Tracking Vehicle 31.75 46.20 link for both
Argoverse Tracking Road 74.73 86.76

Results

Single Class Learning

Qual

Multi Class Learning

Qual-Multi

Contact

If you meet any problems, please describe them in issues or contact:

License

Thanks for the open-source related works. This project partially depends on the sources of Monolayout, PYVA, and JPerciever.