Accurate Low Latency Visual Perception for Autonomous Racing: Challenges Mechanisms and Practical Solutions
This is the Pytorch side code for the accurate low latency visual perception system introduced by Kieran Strobel, Sibo Zhu, Raphael Chang, and Skanda Koppula. "Accurate Low Latency Visual Perception for Autonomous Racing: Challenges Mechanisms and Practical Solutions" . If you use the code, please cite the paper:
BibTex Citation link will be given once the paper has been accepted
Abstract
Autonomous racing provides the opportunity to test safety-critical perception pipelines at their limit. This paper describes the practical challenges and solutions to applying state-of-the-art computer vision algorithms to build a low-latency, high-accuracy perception system for DUT18 Driverless(DUT18D), a 4WD electric race car with podium finishes at all Formula Driverless competitions for which it raced. The key components of DUT18D include YOLOv3-based object detection, pose estimation and time synchronization on its dual stereovision/monovision camera setup. We highlight modifications required to adapt perception CNNs to racing domains, improvements to loss functions used for pose estimation, and methodologies for sub-microsecond camera synchronization among other improvements. We perform an extensive experimental evaluation of the system, demonstrating its accuracy and low-latency in real-world racing scenarios.
CVC-YOLOv3 is the MIT Driverless Custom implementation of YOLOv3.
One of our main contributions to vanilla YOLOv3 is the custom data loader we implemented:
Each set of training images from a specific sensor/lens/perspective combination is uniformly rescaled such that their landmark size distributions matched that of the camera system on the vehicle. Each training image was then padded if too small or split up into multiple images if too large.
Our final accuracy metrics for detecting traffic cones on the racing track:
mAP | Recall | Precision |
---|---|---|
89.35% | 92.77% | 86.94% |
CVC-YOLOv3 Dataset with Formula Student Standard is open-sourced here
RektNet is the MIT Driverless Custom Key Points Detection Network.
RektNet takes in bounding boxes outputed from CVC-YOLOv3 and outputs seven key points on the traffic cone, which is responsible for depth estimation of traffic cones on the 3D map. v Our final Depth estimation error VS Distance graph (The Monocular part):
RektNet Dataset with Formula Student Driverless Standard is open-sourced here
This repository is released under the Apache-2.0 license. See LICENSE for additional details.