This dataset contains 2048 image groups, and each group contains triple-modal images (i.e., visible image, depth image, and thermal image). All of the images have the same resolution of 640×480. This dataset collected 34 household items in the seven most common household scenes. The proportion of each scene and item category is shown in the following figure.
V information mainly has seven challenging scenes. V-SA (similar appearance): the salient object has a similar color or shape to the background. V-BSO (big salient object): the ratio of the sum of salient pixels to the total pixel sum of the entire image is greater than 0.08. V-SSO (small salient object): the ratio of the sum of salient pixels to the total pixel sum of the entire image is less than 0.007. V-MSO (multiple salient objects): the number of salient objects is more than one. V-LI (low illumination): images are collected under low illumination, and objects are not easier to identify visually. V-SI (side illumination): illumination is given from the side of salient objects, and the brightness of salient objects is uneven. V-NI (no illumination): the image is collected under no illumination, and objects are visually difficult to identify.
D information mainly has four challenging scenes. D-BM (background messy): background messy when there is no wallpaper. D-II (information incomplete): partial lack of D information leads to incomplete information of salient objects. D-SSO (small salient objects): the ratio of the sum of salient pixels to the total pixel sum of the entire image is less than 0.007. D-BI (background interference): using wallpaper as a background to interfere with D information.
T information mainly has three challenging scenes. T-Cr (crossover): the salient object has a similar temperature to the surrounding or other objects. T-RD (radiation dispersion): part of a salient object is more salient than the whole object. T-HR (heat reflection): the heat radiation of the salient object is reflected.
The overall architecture of the proposed HWSI method and two main modules are shown in the following figure.
Comparison of the salient map visualization results of the proposed model and the latest methods in dealing with different challenging scenes. Visual comparison results of two modalities are disturbed.
The dataset and code are available at:https://pan.baidu.com/s/1JyFBtjlJGf4GE2zeciN1wQ?pwd=bipy
https://ieeexplore.ieee.org/document/9931143/
K. Song, J. Wang, Y. Bao, L. Huang and Y. Yan, "A Novel Visible-Depth-Thermal Image Dataset of Salient Object Detection for Robotic Visual Perception," in IEEE/ASME Transactions on Mechatronics, vol. 28, no. 3, pp. 1558-1569, June 2023, doi: 10.1109/TMECH.2022.3215909.
[1] Lightweight Multi-level Feature Difference Fusion Network for RGB-D-T Salient Object Detection [J]. Journal of King Saud University - Computer and Information Sciences, 2023 https://github.com/VDT-2048/MFDF
[2] MFFNet: Multi-modal Feature Fusion Network for VDT Salient Object Detection[J]. IEEE Transactions on Multimedia, 2023. https://ieeexplore.ieee.org/abstract/document/10171982
[3] Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection[J]. IEEE Transactions on Image Processing, vol. 33, pp. 3212-3226, 2024 https://ieeexplore.ieee.org/abstract/document/10516304
https://github.com/Lx-Bao/QSFNet
[1] Multiple Graph Affinity Interactive Network and A Variable Illumination Dataset for RGBT Image Salient Object Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(7), 3104-3118. https://github.com/huanglm-me/VI-RGBT1500
RGB-T Image Analysis Technology and Application: A Survey [J]. Engineering Applications of Artificial Intelligence, 2023, 120, 105919. https://www.sciencedirect.com/science/article/abs/pii/S0952197623001033