Yuanchu Dang and Wei Luo
Our repo contains a PyTorch implementation of the Complex YOLO model with uncertainty for object detection in 3D.
Our code is inspired by and builds on existing implementations of Complex YOLO implementation of 2D YOLO and sample Complex YOLO implementation.
Our further contributions are as follows:
- Added dropout layers and incorporated uncertainty into 3D object detection while preserving average precision.
- Projected predictions to 3D using homography.
- Attempted to add innovative loss terms to improve the model in cases when it predicts overlapping bounding boxes.
To run the model, you need to download and unzip the following data:
-
Velodyne point clouds (29 GB): Information about the surrounding for a single frame gathered by Velodyne HDL64 laser scanner. This is the primary data we use.
-
Left color images of object data set (12 GB): The cam- eras were one color camera stereo pairs. We use left Images corresponding to the velodyne point clouds for each frame.
-
Camera calibration matrices of object data set (16 MB): Used for calibrating and rectifying the data captured by the camera and sensor.
The following is an visualization of a sample image and its corresponding velodyne point-cloud.
First, you need to have a train.txt under data/training that contains 6-digit indices for the images that you want in the training set. Each line corresponds to one image. See the sample file in this repo.
A reasonable train.txt can be generate by running:
python generate_train_txt.py
This produces a train.txt that contains the first 6000 images. Toggle the optional arguments to set your desired size.
Next, you can kick off the training process by executing:
python train.py
There are also optional arguments that control batch size, logging, learning rate, momentum, weight decay and epochs. To figure out their usage, simply read the prompts in the parser and track their usage in the script.
To run predictions on images, use predict.py. Suppose the directory for your trained model is model/ComplexYOLO_epoch400, and you want to predict and draw bounding boxes on the first 100 images. Execute the following command:
python predict.py 1 100 model/ComplexYOLO_epoch400
The first argument is the starting index of the image you want to predict, and the second argument is the ending index, both inclusive.
There are also two optional arguments:
- --mode: "train" or "eval", case insensitive, with batch normalization and dropout layers actively engaged under the "train" mode. Default is set to the "eval" mode.
- --num_predict: number of times to evaluate each image, only meaningful for the "train" mode with active dropout layers. Default is set to 1.
For example, if you wish to turn on batch normalization and the dropout layers, and evaluate, say image 21, 1000 times to get a sense of the uncertainty for the model prediction, you can do:
python predict.py 21 21 model/ComplexYOLO_epoch400 --mode train --num_predict 1000
The heat folder and project folder contain code for generating heatmap and 3D projections, respectively. The heatmap script loads a saved .npy file containing bounding box predictions, and a .png file for the corresponding road image. Note that running the heatmap script requires an account on plotly. After running the program, it will put the resulting image on plotly. You should change the configurations inside the script accordingly. For projection, the script loads in saved .npy files containing target and prediction boxes, as well as original road image and corresponding velodyne point-cloud with target and prediction boxes drawn. It also needs predefined heights and fine-tuned homography anchor points to produce an accurate 3D projection.
Below are sample velodyne point-cloud with box predictions, along with the corresponding heatmaps that show our model's confidence.
Below is a comparison of average precision between original Complex YOLO and our Complex YOLO with uncertainty.
You may refer to either our report or poster for more details.
For future work, we can train model directly on labeled 3D data to make direct predictions without having to use homography and be able to visualize uncertainty in 3D. We can also attempt to take other models such as Fast-RCNN to 3D. Yet another direction would to extend to 4D as just presented at NeurIPS 2018: YOLO 4D!
We would like to thank Professor Iddo Drori and Chenqin for their constructive feedbacks throughout this project!