This project performs comprativie analysis between YOLO 3 and YOLO Complex versions 3 and 4. Two pre-trained models on Kitti dataset are utilized to detect three categories: pedestrians, cars, and bicycles.
This comparative analysis is helpful in applications which require multiple sensors. For example, autonomous cars usually uses depth camera (2D) and LIDAR (3D) to percieve the surroundings. So the accuracy for the two sensors individually and combined should be high enough. Thus, this project combines between the LiDAR and camera sensors which ensures a backup to the object detection task, in case either of the two sensors is faulty.
KITTI dataset is used in this project. Due to its extreme large size, we only used a smaller portion of the dataset, which is uploaded on this drive link. The data are defined as follows:
Folder | Content |
---|---|
Velodyne | contains 3D point cloud |
Image | contains RGB images obtained from camera (used for visualization) |
Label | contains the ground truth bounding boxes and labels |
Calibration | contains transformations between LiDAR coordinates and camera coordinates |
As a preprocessing, all the 3D point clouds are converted into 2D bird eye view (BEV) images since YOLO model only deals with 2D images.
We used three different models in this project. YOLO Complex3 and YOLO complex4 are used with point cloud, while YOLO v3 is used with 2D images. The system design is shown in the following image.
Since two different vesions used for YOLO Complex, the hyperparameters are different and are summarized in this table:
YOLO V3 | YOLO V3 | |
---|---|---|
Image size | 608 x 608 x 3 | 608 x 608 x 3 |
Convolution Layers | 106 | 110 |
Batch Size | 64 | 64 |
Momentum | 0.9 | 0.949 |
Learning Rate | 0.001 | 0.0013 |
More technical details can be found in project report and presentation in the documentation folder.
A sample result for both models, the 2D bounding box (green color) is from YOLO 3 and the 3D bounding box (different colors) is from Complex YOLO.
This table shows the average percision (AP) for the two complex models:
Model | Car | Pedestrian | Cyclist | Average |
---|---|---|---|---|
Complex-YOLO-v3 | 0.98 | 0.75 | 0.75 | 0.833 |
Complex-YOLO-v4 | 1 | 0.86 | 1 | 0.95 |
${ROOT}
└── Documentation/
└── Presentation.pdf
└── Project-Report.pdf
├── README.md
├── Complexv3_Yolo3.ipynb
└── Complex_Yolo4.ipynb
This project was contributed equally by me and Sohaila Ahmed as part of Computer Vision course by Porf Adil Khan at Innopolis Univeristy (Spring 2022) .