Vehicle Detection

Goals

Detection and tracking of vehicle on the road in near real-time.

Implementation Overview

Color histogram feature extraction
Spatial bins subsamples feature extraction
HOG feature extraction
SVM linear classifier trained on a labeled binary class dataset
Sliding window search with temporal heatmap false-positive reduction

Libraries

import numpy as np
import cv2
import os
import glob
import matplotlib.pyplot as plt
%matplotlib inline
from skimage.feature import hog
from sklearn.preprocessing import StandardScaler
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import LinearSVC
import time
from moviepy.editor import VideoFileClip

Defaults Setup

plt.rcParams['figure.figsize'] = [14,8]
plt.rcParams['figure.frameon'] = False
plt.rcParams['figure.edgecolor'] = 'none'  
plt.rcParams['figure.facecolor'] = 'none' 
plt.rcParams['ytick.left'] = False
plt.rcParams['ytick.right'] = False
plt.rcParams['xtick.bottom'] = False
plt.rcParams['xtick.top'] = False
plt.rcParams['ytick.labelleft'] = False
plt.rcParams['ytick.labelright'] = False
plt.rcParams['xtick.labelbottom'] = False
plt.rcParams['xtick.labeltop'] = False
plt.rcParams['axes.grid'] = False
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['figure.subplot.hspace'] = 0.01
plt.rcParams['figure.subplot.wspace'] = 0.01
plt.rcParams['image.interpolation'] = 'bilinear'

Load Dataset

Load Vehicle Class Data

VEHICLE_PATH = './vehicles/**/*.png'
vehicle_paths = vehicle = glob.glob(VEHICLE_PATH)
vehicle = [cv2.imread(path, cv2.IMREAD_COLOR)[...,::-1] for path in vehicle_paths]

Load Not-Vehicle Class Data

NOT_VEHICLE_PATH = './non-vehicles/**/*.png'
not_vehicle_paths = glob.glob(NOT_VEHICLE_PATH)
not_vehicle = [cv2.imread(path, cv2.IMREAD_COLOR)[...,::-1] for path in not_vehicle_paths]

Exploratory Analysis

Sample Count, Dimensionality, Data Type

print('vehicle sample count:',len(vehicle))
print('vehicle image shape:',vehicle[0].shape)
print('vehicle data type:',vehicle[0].dtype)

vehicle sample count: 8792
vehicle image shape: (64, 64, 3)
vehicle data type: uint8

print('not vehicle sample count:',len(not_vehicle))
print('not vehicle image shape:',not_vehicle[0].shape)
print('not vehicle data type:',not_vehicle[0].dtype)

not vehicle sample count: 8968
not vehicle image shape: (64, 64, 3)
not vehicle data type: uint8

Class Showcase

Not-Vehicles

all(ax.imshow(im) for im,ax in zip(not_vehicle[:15], plt.subplots(3,5)[1].ravel()))

Vehicles

all(ax.imshow(im) for im,ax in zip(vehicle[:15], plt.subplots(3,5)[1].ravel()))

Histogram Comparison

ax = plt.subplots(2,3)[1]
for i in range(3):
    hist = ax[0][i].hist(vehicle[...,i].flatten(),bins=256)
    hist = ax[1][i].hist(not_vehicle[...,i].flatten(),bins=256)
plt.tight_layout()

ax = plt.subplots(2,3)[1]
for i in range(3):
    hist = ax[0][i].hist(vehicle[...,i].flatten(),bins=32)
    hist = ax[1][i].hist(not_vehicle[...,i].flatten(),bins=32)
plt.tight_layout()

Feature Extractions

We extract 3 kinds of features:

HLS color histogram
Downsampled bins
HOG

Data Preparation

We combine both sample classes for preprocessing.

x_data = vehicle+not_vehicle
len(x_data)

Color Histogram

We produce a color histogram for each channel, we then flatten and concatenate to produce the feature map. HLS color space was selected due to anecdotic correlation with better accuracy during experimintation.

Color Transformation

def color_transform(img):
    return cv2.cvtColor(img, cv2.COLOR_RGB2HSV)

Histogram Computation

def extract_hist(img):
    hist = list(map(lambda i: np.histogram(img[...,i], bins=128, range=(0,256))[0], range(3))) 
    return np.concatenate((hist[0], hist[1],hist[2]))

Spatial Binning of Color

We produce spatial color bins by downsampling and flattening out input images. This feature map is an indicator of correlation between individual pixel values. (32,32) pixel size was selected empirically.

def extract_spatial(img):
    return cv2.resize(img, (32,32), interpolation=cv2.INTER_LINEAR).ravel()

HOG

We compute a Histogram of Oriented Gradients for classifier feature input. The HOG feature map provides a scale invariant gradient description of our classification targets. We've set the parameters as follows: orientations=12, pixels_per_cell=(4,4), cells_per_block=(4,4). Parameters values were set after an extended period of empirical tunning to ensure above 99% classifier accuracy.

hog_desc = cv2.HOGDescriptor((64,64),(4,4),(4,4),(4,4),12)
def extract_hog(img):
    hh = hog_desc.compute(img[...,0]).ravel()
    sh = hog_desc.compute(img[...,1]).ravel()
    vh = hog_desc.compute(img[...,2]).ravel()
    return np.concatenate((hh,sh,vh))

Combining Features

def combine_features(cf,sf,hf):
    return np.concatenate((cf,sf,hf))

Composite Extraction Pipeline

def extract_features(img):
    img = color_transform(img)
    cf,sf,hf = extract_hist(img), extract_spatial(img), extract_hog(img)
    return combine_features(cf,sf,hf)

combined_f = list(map(lambda im: extract_features(im), x_data))
len(combined_f)

Classification Preprocessing

Stacking

X = np.vstack(combined_f).astype(np.float64)
X.shape

(17760, 12672)

Labeling

We've attached binary labels corresponding to our target classes: 0:non-vehicle, 1:vehicle.

y = np.concatenate((np.ones(len(vehicle),np.bool), np.zeros(len(not_vehicle),np.bool)))
y.shape

(17760,)

Randomization

X,y = shuffle(X,y)

Test Validation Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Normalization

The data is normalized per column based on training features. Thereafter, the same extracted parameters are used to normalize the validation and production input.

X_scaler = StandardScaler().fit(X_train)  # fit scaler to traianing data
X_train = X_scaler.transform(X_train)
X_test = X_scaler.transform(X_test)

def normalize(samples):
    return X_scaler.transform(samples)

SVM Classifier

We'll use a linear SVM - ???? whats that?

SVC Training

We've utilized randomized grid hyperparameter search to estimate optimal classifier kernel and penalty values. We've found the RBF kernel to offer greater accuracy, however its prediction time is too long for real time usage. Therefore, we've chosen the linear classifier with a penalty ('C' value) of 1.

svc = GridSearchCV(LinearSVC(), {'C':[1,3,5,7,10]}, n_jobs=4)
svc.fit(X_train, y_train)
print(' Done Training')

 Done Training

svc.best_params_

{'C': 1}

SVC Evaluation

Test Accuracy

print(' Test Accurcay =', round(svc.score(X_test, y_test), 4))

 Test Accurcay = 0.9899

Prediction Time

t=time.time(); n_predict = 1000;  svc.predict(X_test[0:n_predict]); t2 = time.time()
print('',round(t2-t, 5), 'Seconds to predict', n_predict,'labels')

 0.01119 Seconds to predict 1000 labels

Sliding Window Search

We've implemented a uniform-dimension sliding window search. The top and bottom (sky and hood) are discarded as they're outside our ROI. A temporal prediction heatmap is implemented in the video pipeline section.

def slide_window(img, size, overlap):
    sy, sx = np.mgrid[352:544:int(size*overlap), 0:1280:int(size*overlap)]
    sy, sx = sy.ravel(), sx.ravel()
    return list(map(lambda y,x: (img[y:y+size,x:x+size],(x,y)), sy,sx))

Bounding Box Extraction and Grouping

def get_rects(pred, win, size):
    pos = np.argwhere(pred==True)
    cord = np.array(win)[pos][...,1][...,0]
    rects = [[int(c[0]),int(c[1]), int(c[0]+size), int(c[1]+size)] for c in cord]
    return cv2.groupRectangles(rects*2, 1, 0.1)[0]

Bounding Box Visualization

def draw_rects(img, rects):
    for r in rects:
        cv2.rectangle(img, (r[0],r[1]),(r[2],r[3]), (0, 0, 255), 3)
    return img

Test Image

img = cv2.imread('./test_images/test1.jpg', cv2.IMREAD_COLOR)[:,:,::-1]
plt.imshow(img)

Small Sliding Window

Small windows (64x64 pixels) reliably provide multiple detections per vehicles. However, they also produce a lot of false positives and carry a significant performance penalty.

small_window_list = slide_window(img, 64, 1.0)

all(ax.imshow(win[0]) for ax,win in zip(plt.subplots(5,20)[1].ravel(),small_window_list))

Classification

f_list = list(map(lambda im: extract_features(im[0]), small_window_list))
pred = svc.predict(normalize(f_list))
pred

array([False, False, False, False, False, False, False, False,  True,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False, False], dtype=bool)

Bounding Boxes

imout = vis_detection(img, pred, small_window_list,64)
plt.imshow(imout)

Medium Sliding Window

Medium windows (128x128 pixels) were found to be unreliable, even when combined with a generous overlap factor.

medium_window_list = slide_window(img, 128, 1.0)

all(ax.imshow(im[0]) for ax,im in zip(plt.subplots(3,10)[1].ravel(),medium_window_list))

True

Classification

f_list = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), medium_window_list))
pred = svc.predict(normalize(f_list))
pred

array([False, False, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False], dtype=bool)

Bounding Boxes

imout = vis_detection(img, pred, medium_window_list,128)
plt.imshow(imout)

Medium Window at Half Overlap

medium_window_list = slide_window(img, 128, 0.5)
f_list = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), medium_window_list))
pred = svc.predict(normalize(f_list))
imout = vis_detection(img, pred, medium_window_list,128)
plt.imshow(imout)

Large Sliding Window

Large windows were practically unusable without further dataset augmentation.

large_window_list = slide_window(img, 256, 1.0)

all(ax.imshow(im[0]) for ax,im in zip(plt.subplots(1,5)[1].ravel(),large_window_list))

Classification

f_list = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), large_window_list))
pred = svc.predict(normalize(f_list))
pred

array([False, False, False, False, False, False, False, False, False, False], dtype=bool)

Video Pipeline

The main distinction of the video pipeline is the addition of a temporal heatmap. The heatmap is implemented as an array of counters. Whereupon predictions are recognized as genuine when high counter values are reached.

def heatmap(pred, heat):
    heat = np.where(pred==True,heat+1, heat-1)
    heat = np.where(heat<1, 1, heat)
    heat = np.where(heat>32, 32, heat)
    pred = np.where(heat>4, True, False)
    return pred, heat

WINDOW_SIZE = 96
def process(img):
    windows = slide_window(img, WINDOW_SIZE, 0.3)
    features = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), windows))
    predictions = svc.predict(normalize(features))
    predictions, process.heat = heatmap(predictions, process.heat)
    rects = get_rects(predictions, windows, WINDOW_SIZE)
    img = draw_rects(img,rects)
    return img
process.heat = np.ones(322, dtype=np.uint8)

video = VideoFileClip("project_video.mp4")#.subclip(27,30)
# video = VideoFileClip("test_video.mp4")
processed_video = video.fl_image(process)
processed_video.write_videofile("output_video.mp4", audio=False, progress_bar=False)

[MoviePy] >>>> Building video output_video.mp4
[MoviePy] Writing video output_video.mp4
[MoviePy] Done.
[MoviePy] >>>> Video ready: output_video.mp4

Link to Output Video Link to Source Video

Discussion

Are a lof of false-positives. There's much improvment to be gained by data augmentation and classifier tweaking or changing it altogether.
The implementation is doesn't make use of possible data parallelism. There's much room for improvment in this regard. Currently, the model suffers from very low overall performance. Furthermore, Some features are re-computed multiple times.
No multi-scale, adaptive, randomized, etc sliding window techniques were used.

roman-smirnov/vehicle-tracking

Vehicle Detection

Goals

Implementation Overview

Libraries

Defaults Setup

Load Dataset

Load Vehicle Class Data

Load Not-Vehicle Class Data

Exploratory Analysis

Sample Count, Dimensionality, Data Type

Class Showcase

Not-Vehicles

Vehicles

Histogram Comparison

Feature Extractions

Data Preparation

Color Histogram

Color Transformation

Histogram Computation

Spatial Binning of Color

HOG

Combining Features

Composite Extraction Pipeline

Classification Preprocessing

Stacking

Labeling

Randomization

Test Validation Split

Normalization

SVM Classifier

SVC Training

SVC Evaluation

Test Accuracy

Prediction Time

Sliding Window Search

Bounding Box Extraction and Grouping

Bounding Box Visualization

Test Image

Small Sliding Window

Classification

Bounding Boxes

Medium Sliding Window

Classification

Bounding Boxes

Medium Window at Half Overlap

Large Sliding Window

Classification

Video Pipeline

Discussion