Project 13 - Person Detection in Thermal Images

made based on the Darknet framework available at: https://github.com/AlexeyAB/darknet

Content

Project 13 - Person Detection in Thermal Images

About the project

Input: Thermal image from surveillance video recording

Dataset: http://ieee-dataport.org/open-access/thermal-image-dataset-person-detection-uniri-tid

Output: detection of the person in the image (bounding box with confidence score)

Requirements:

Recognition and localization of the person
Multiple persons should be detected
Evaluating detection performance - average precision

Basic Terms

Classification - categorizing objects in a picture

example: in this picture is a person

Detection - categorizing and locating object in a picture

Segmentation - dividing a picture into segments that represent objects or their parts, sorts pixels into larger components [https://missinglink.ai/guides/computer-vision/image-segmentation-deep-learning-methods-applications/]

[http://ronny.rest/tutorials/module/seg_01/segmentation_01_intro/]

YOLO - You Only Look Once

Yolo is a state-of-the-art real time object detection system that works as followed:

Single neural network is applied to the full image
Region division -> bounding boxes, probabilities for each region
Weighting probabilities, post processing

[https://pjreddie.com/darknet/yolo/]

Evaluation metrics

AP-The average precision over all 10 IoU thresholds (i.e., [0.5:0.05:0.95]) of all object categories

APIOU = 0.50 - The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50

APIOU = 0.75 - The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75

MS COCO original script for calculation [https://github.com/cocodataset/cocoapi]

Script [https://github.com/Cartucho/mAP] for IoU display on selected images

IoU (intersect over union) - average intersect over union of objects and detections for a certain threshold

mAP (mean average precision) - mean value of average precisions for each class

due to having only one class, we have only AP

[https://github.com/AlexeyAB/darknet]

Database

Avalable at: http://ieee-dataport.org/open-access/thermal-image-dataset-person-detection-uniri-tid (University of Rijeka, UNIRI-TID)

train set: 3790 images
val set: 2527 images
ratio: 60 : 40

We distributed various types of weather scenes evenly among two sets

Training

For training and testing we used Darknet framework [https://github.com/AlexeyAB/darknet]

In .cfg file for training we set the resolution grid:

width: 416
height = 416
batch = 16
max_batches: 6000

Using the transfer learning, YOLOv4 model pre-trained on MS COCO [] dataset, we trained on thermal images for about 5 hours

Realizaion of tasks

Task 1: Recognition and localization of a person

We made a person detection script. It took images from an input folder and after YOLOv4 detection saved images with detection markers into an output folder

Task 2: Multiple people to be detected

Additionaly, along with finding bounding boxes and confidence scores, we added code to count the number of people on an image

Task 3: Evaluating detection performance -average precision

We evaluated the performance of the YOLOv4 model on thermal images with a model trained only on the MS COCO data set and a model that was trained on 60% of the obtained thermal images by transfer learning
Detection enhancement for each metric:
- AO from 15% to 54%
- AP50 from 28% to 99%
- AP75 from 10% to 50%
- APs from 2% to 40%,
- APm from 13% to 54%
- APL from 52% to 61%

SuccSuccessful result example

Unsuccessful result example

Results

Detection results of a YOLOv4 model trained only on the MS COCO data set

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all         | maxDets=100 ] = 0.135

Average Precision  (AP) @[ IoU=0.50          | area=   all         | maxDets=100 ] = 0.276

Average Precision  (AP) @[ IoU=0.75          | area=   all         | maxDets=100 ] = 0.100

Average Precision  (AP) @[ IoU=0.50:0.95 | area= small      | maxDets=100 ] = 0.016

Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium  | maxDets=100 ] = 0.131

Average Precision  (AP) @[ IoU=0.50:0.95 | area= large       | maxDets=100 ] = 0.517

Number of ground-truth objects: 3714

Number of detected objects: 1128 (tp:1106, fp:22)

Detection results of a model trained on the MS COCO data set and thermal images

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all         | maxDets=100 ] = 0.536

Average Precision  (AP) @[ IoU=0.50          | area=   all         | maxDets=100 ] = 0.987

Average Precision  (AP) @[ IoU=0.75          | area=   all         | maxDets=100 ] = 0.502

Average Precision  (AP) @[ IoU=0.50:0.95 | area= small      | maxDets=100 ] = 0.403

Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium  | maxDets=100 ] = 0.544

Average Precision  (AP) @[ IoU=0.50:0.95 | area= large       | maxDets=100 ] = 0.611


Number of ground-truth objects: 3714

Number of detected objects: 3892 (tp:3658, fp:234)

Conclusion

We had to make a person detector for infrared images. We decided to do that with YOLO v4 and the Darknet framework because it seemed the fastest and most accurate, and it gave better results than we expected.

sdumencic/Project_13

Project 13 - Person Detection in Thermal Images​