Object Detection

1. Using an Image Classifier(pretrained) to detect objects using keras and openCV

Here we take a Convolutional Neural Network trained for image classification (pre-trained RESNET-50) and utilize image pyramids, sliding windows, and non-maxima suppression to build a basic object detector.Basically we combine traditional computer vision object detection algorithms with deep learning.

In Image Classification :

Input : Image --> Output : Class Label
We present the input image to our neural network, and we obtain a single class label and a probability associated with the class label prediction.This class label characterizes the contents ( the most dominant and visible contents) of the image.

Object Detection :

Along with outputting the class labels i.e the objects present in the image, it also outputs where in the image the objects are with multiple bounding box coordinates.

More specifically, it outputs 3 values,including :
1. A list of bounding boxes, or the (x, y)-coordinates for each object in an image 2. The class label associated with each of the bounding boxes 3. The probability/confidence score associated with each bounding box and class label

How deep learning image classifier can be converted into an object detector?

We utilise the elements of traditional computer vision algorithms to convert our CNN image classifier into an object detector.

The first element we use is Image Pyramids:

An “image pyramid” is a multi-scale representation of an image:

At the bottom of the pyramid, we have the original sized image . At each subsequent layer, the image is resized and optionally smoothed. The image is progressively subsampled until some stopping criterion is met( when a minimum size has been reached), and no further subsampling is required.

The second element we use is Sliding Windows:

A sliding window is a fixed-size rectangle that slides from left-to-right and top-to-bottom within an image:

At each stop of the window we would:

Extract the Image within the sliding window
Input Image to an Image Classifier
obtain predictions(class label and probability scores)

Image pyramids and sliding windows helps us localize objects at different locations and multiple scales of the input image

The Third element we use is Non-Maxima Suppression:

The object detectors generally outputs multiple, overlapping bounding boxes surrounding an object in an image.
This happens because as the sliding window approaches an image, the classifier outputs larger and larger probabilities of the object class(i.e higher probability of object being detected) .

Since there’s only one object of a particular class,multiple bounding boxes can create a problem.
The solution is to apply non-maxima suppression (NMS), which removes weak, overlapping bounding boxes by giving us the ones with higher confidence.

1.3 The steps we follow in the Object Detection Algorithm: