Thermal images are mainly used to detect the presence of people at night or in bad lighting conditions, but perform poorly at daytime. To solve this problem, most state-of-the-art techniques use a fusion network that uses features from paired thermal and color images. We propose to augment thermal images with their saliency maps as an attention mechanism to provide better cues to the pedestrian detector, especially during daytime. We investigate how such an approach results in improved performance for pedestrian detection using only thermal images, eliminating the need for color image pairs. We train a state-of-the art Faster R-CNN for pedestrian detection and explore the added effect of PiCA-Net and R3-Net as saliency detectors. Our proposed approach results in an absolute improvement of 13.4 points and 19.4 points in log average miss rate over the baseline in day and night images respectively. We also annotate and release pixel level masks of pedestrians on a subset of the KAIST Multispectral Pedestrian Detection dataset, which is a first publicly available dataset for salient pedestrian detection.
We select 1702 images from the training set of the KAIST Multispectral Pedestrian dataset, by sampling every 15th image from all the images captured during the day and every 10thimage from all the images captured during the night, which contain pedestrians. These images were selected in order to have approximately the same number of images captured on both times of the day (913 day images and 789 night images), containing 4170 instances of pedestrians. We manually annotate these images using the VGG Image Annotator tool to generate the ground truth saliency masks based on the location of the bounding boxes on pedestrians in the original dataset. Additionally, we create a set of 362 images with similar annotations from the test set to validate our deep saliency detection networks, with 193 day images and 169 night images, containing 1029 instances of pedestrians. The distribution of pedestrians per frame is shown in the figure below:
- Thermal Images
- Thermal Images Fused with Static Saliency Maps
- Thermal Images Fused with Saliency Maps generated using PICA-Net
- Thermal Images Fused with Saliency Maps generated using R3-Net
If you find this work or dataset useful, please consider citing:
@inproceedings{Kaist_Salient_Pedestrian_Dataset,
author = {Debasmita Ghose and
Shasvat Desai and
Sneha Bhattacharya and
Deep Chakraborty and
Madalina Fiterau and
Tauhidur Rahman},
title = {Pedestrian Detection in Thermal Images using Saliency Maps},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
pages = {},
year = {2019}
}
*Authors Contributed Equally