Comparative Analysis between YOLO 3 and YOLO Complex

This project performs comprativie analysis between YOLO 3 and YOLO Complex versions 3 and 4. Two pre-trained models on Kitti dataset are utilized to detect three categories: pedestrians, cars, and bicycles.

Motivation

This comparative analysis is helpful in applications which require multiple sensors. For example, autonomous cars usually uses depth camera (2D) and LIDAR (3D) to percieve the surroundings. So the accuracy for the two sensors individually and combined should be high enough. Thus, this project combines between the LiDAR and camera sensors which ensures a backup to the object detection task, in case either of the two sensors is faulty.

Dataset & Preprocessing

KITTI dataset is used in this project. Due to its extreme large size, we only used a smaller portion of the dataset, which is uploaded on this drive link. The data are defined as follows:

Folder	Content
Velodyne	contains 3D point cloud
Image	contains RGB images obtained from camera (used for visualization)
Label	contains the ground truth bounding boxes and labels
Calibration	contains transformations between LiDAR coordinates and camera coordinates

As a preprocessing, all the 3D point clouds are converted into 2D bird eye view (BEV) images since YOLO model only deals with 2D images.

Models

We used three different models in this project. YOLO Complex3 and YOLO complex4 are used with point cloud, while YOLO v3 is used with 2D images. The system design is shown in the following image.

Since two different vesions used for YOLO Complex, the hyperparameters are different and are summarized in this table:

	YOLO V3	YOLO V3
Image size	608 x 608 x 3	608 x 608 x 3
Convolution Layers	106	110
Batch Size	64	64
Momentum	0.9	0.949
Learning Rate	0.001	0.0013

More technical details can be found in project report and presentation in the documentation folder.

Results

A sample result for both models, the 2D bounding box (green color) is from YOLO 3 and the 3D bounding box (different colors) is from Complex YOLO.

Evaluation Metrics

This table shows the average percision (AP) for the two complex models:

Model	Car	Pedestrian	Cyclist	Average
Complex-YOLO-v3	0.98	0.75	0.75	0.833
Complex-YOLO-v4	1	0.86	1	0.95

Folder structure

${ROOT}
└── Documentation/    
    └── Presentation.pdf
    └── Project-Report.pdf
├── README.md 
├── Complexv3_Yolo3.ipynb
└── Complex_Yolo4.ipynb

Disclaimer

This project was contributed equally by me and Sohaila Ahmed as part of Computer Vision course by Porf Adil Khan at Innopolis Univeristy (Spring 2022) .

sohaila-ahmed3011/YOLO-Comparative-Analysis