Localization-of-Shafts-using-YOLO

Object Localization helps to find the location of a object by providing a bounding box around it. For localization purpose we start in same way as we start a classification problem. As we are looking to localize the object, the network of classification has to be modified to output different parameters like height and width of bounding box along with x and y coordinates of top left corner. Many alogirthms like R-CNN and its family, Single Shot Detector, Spatial Pyramid Pooling and YOLO are popular.

YOLO : YOLO stands for You Only Look Once. YOLO is considered as a state-of-art model for real-time detection systems. YOLO in simple words is a network which only goes through image once. Here image is divided into different regions and predicts the bounding box along with probabilities of prediction for each region. When image is given to the network during testing the image divided into regions of same size. The output is a bounding box along with the class of the object detected in the image region. YOLO out performs other alogirthms in real-world with it's computational speed without losing accuracy.

Here we have changed the approach. We output keypoints pair for each region but not bounding box. As we are trying to predict a single object category i.e only shafts we only one object class. Each image region predicts a keypoint pairs. YOLO divides image into small regions. The region (grid cell) is responsible for detecting an object present over there based on center point. Each cell predicts the confidence score along with keypoint pairs. Confidence score reveals the accuracy of model by denoting how confident the model is to predict by outputing keypoint pair with coordinates X, Y along with angle alpha. Predicted coordinates are given by (X0,y0,alpha) and actual coordinates of the center for shaft is (x0,y0,alpha).

Keypoint has 5 parameters in the output (class probability, X0,Y0,alpha, confidence score).