Goal is to write a software pipeline to detect vehicles in a video using Support Vector Machines
Jupyter NotebookMIT
Vehicle Detection and Tracking
Goal - To write a software pipeline to detect vehicles in a video.
Vehicle Detection Project
The goals / steps of this project are the following:
Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
Explored different color spaces RGB, HSV, HLS, LAB, YCRB and different skimage.hog() and parameters (orientations, pixels_per_cell, and cells_per_block).
The HOG extractor extracts meaningful features of a image.
It captures the common aspects of cars, not the specifics of it.
It is the same as humans (at the first glance), we locate the car, not the model, the tires, or other small details.
It divides an image into several pieces. For each piece, it calculates the gradient of variation in a given number of orientations.
The idea is that the HOG captures the essence of original image.
After taking educated guesses(also tried to use parameter tuning) arrived with the given set of parameters that work fine for detecting and separating cars from non-cars.
Also, HOG works fine on the following Color spaces - LUV, YUV and YCrCb
I decided to search random window positions at random scales all over the image and came up with this (ok just kidding I didn't actually ;):
Adapted the method find_cars from the udacity materials.
The method combines HOG feature extraction with a sliding window search.
Instead of calculating feature extraction on each window individually which can be time consuming, the HOG features are extracted for the entire image.
The full-image features are subsampled according to the size of the window.
Then the respective portion is fed to the classifier.
Prediction on the HOG features for each window is performed and a list of rectangle objects are returned if there is a match.
7. Multiple Detections
Multiple Detections 1
Multiple Detections 2
8. False Positives and Combining Overlapping regions with Heatmap
Heatmap are necessary to find the overlapping regions.
Overlapping regions can be used to measure the confidence.
Confidence can be thresholded to remove false positives.
I recorded the positions of positive detections in each frame of the video. From the positive detections I created a heatmap and then thresholded that map to identify vehicle positions. I then used scipy.ndimage.measurements.label() to identify individual blobs in the heatmap. I then assumed each blob corresponded to a vehicle. I constructed bounding boxes to cover the area of each blob detected.
Here's an example result showing the heatmap from a series of frames of video, the result of scipy.ndimage.measurements.label() and the bounding boxes then overlaid on the last frame of video:
Heatmap without Thresholding
Heatmap with Thresholding
Heat Map GrayScale
Combining the overlapping region and forming a single bounding box per car.
Here is the output of scipy.ndimage.measurements.label() on the integrated heatmap from all six frames:
When we try to increase the number of windows, we compromise on the realtime speed requirement.
If we use the previous frame as an approximate position of the car in the next frame, we loose upcoming traffic which changes it position drastically.
Hand tunining the configuration parameters works fine but is not a scalable solution. E.g If we are supposed to detect bikes using the same code, it would not work.
Following Neural Networks would do a better job of detection without much tuning:
Single Shot Multibox Detector (SSD) with MobileNets
SSD with Inception V2
Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101
Faster RCNN with Resnet 101
Faster RCNN with Inception Resnet v2
Note: In most of the cases above, we only need to train the penultimate layer.