Memory-Guided Collaborative Attention for Nighttime Thermal Infrared Image Colorization of Traffic Scenes

Pytorch implementation of the paper "Memory-Guided Collaborative Attention for Nighttime Thermal Infrared Image Colorization of Traffic Scenes".

Paper

Qualitative Comparison of Video Colorization

Example 1

CycleGAN PearlGAN MornGAN

Example 2

CycleGAN PearlGAN MornGAN

Abstract

Robust imaging under challenging conditions, such as starlit nights, has broadened the adoption of thermal infrared (TIR) cameras for nighttime driving scenes. Given that TIR images are monochromatic, which makes them difficult to interpret by humans and limits the applicability of RGB-based algorithms, it is reasonable to perform colorization of nighttime TIR (NTIR) images by converting them into corresponding daytime color images (NTIR2DC). Despite the impressive results achieved by previous NTIR2DC methods, how to improve the colorization performance of small-sample categories without semantic annotation is under-explored. To address this issue, we propose a novel learning framework called Memory-guided cOllaboRative atteNtion Generative Adversarial Network (MornGAN), which is inspired by the analogical reasoning mechanisms of humans. Specifically, we first propose an online semantic distillation module to mine and refine the semantic cues of NTIR images. Then, a memory-guided sample selection strategy and adaptive collaborative attention loss are devised to enhance the semantic preservation of small-sample categories. Further, a new conditional gradient repair loss is introduced for reducing edge distortion during translation. Extensive experiments on the NTIR2DC task show that the proposed MornGAN significantly outperforms other image-to-image translation methods in terms of semantic preservation and edge consistency, which helps improve the object detection accuracy remarkably.

Prerequisites

Python 3.8
Pytorch 1.7.1 and torchvision 0.8.2
TensorboardX
visdom
dominate
pytorch-msssim
kmeans_pytorch
CUDA 11.6.55, CuDNN 8.4, and Ubuntu 20.04.

Data Preparation

Download FLIR and KAIST. First, the corresponding training set and test set images are sampled according to the txt files in the ./img_list/ folder. Then, all images are first resized to 500x400, and then crop centrally to obtain images with a resolution of 360x288. Note that due to negligent checking by the authors, the test set images for the KAIST dataset only need to be center cropped to 360x288 without a resize step. Finally, place all images into the corresponding dataset folders. Domain A and domain B correspond to the daytime visible image and the nighttime TIR image, respectively. As an example, the corresponding folder structure for the FLIR dataset is:

mkdir FLIR_datasets
# The directory structure should be this:
FLIR_datasets
  ├── trainA (daytime RGB images)
      ├── FLIR_00002.png 
      └── ...
  ├── trainB (nighttime IR images)
      ├── FLIR_00135.png
      └── ...
  ├── testA (testing daytime RGB images)
      ├── FLIR_09112.png (The test image that you want)
      └── ... 
  ├── testB (testing nighttime IR images)
      ├── FLIR_08872.png (The test image that you want)
      └── ... 

mkdir FLIR_testsets
# The directory structure should be this:
FLIR_testsets
  ├── test0 (empty folder)
  ├── test1 (testing nighttime IR images)
      ├── FLIR_08863.png
      └── ...

We predict the edge maps of Nighttime TIR images and daytime color images using MCI method and Canny edge detection method, respectively. Next, place all edge maps into the corresponding folders(e.g., /FLIR_IR_edge_map/ and /FLIR_Vis_edge_map/ for FLIR dataset).

For segmentation mask prediction of DC images, we first utilize HMSANet and Detectron2 models to obtain the initial mask. Then, the masks obtained from the predictions of the two models are fused and semantic denoising is performed, which can be realized by running the code MaskFusedDenoised_demo.m. You will need to modify the four paths in the code to suit your situation. The final masks for the DC images used for training of the FLIR and KAIST datasets can be downloaded via google drive. Next, place all segmentation masks into the corresponding folders(i.e., /FLIR_Vis_seg_mask/ and /KAIST_Vis_seg_mask/ for the FLIR and KAIST dataset, respectively).

Inference Using Pretrained Model

1) FLIR

Download and unzip the pretrained model and save it in ./checkpoints/MornGAN_FLIR/. Place the test images of the FLIR dataset in ./FLIR_testsets/test1/. Then run the command

python test_output_only.py --phase test --serial_test --name MornGAN_FLIR --dataroot ./FLIR_testsets/ --n_domains 2 --which_epoch 80 --results_dir ./res_FLIR/ --loadSize 288 --net_Gen_type gen_v1 --no_flip --gpu_ids 0

2) KAIST

Download and unzip the pretrained model and save it in ./checkpoints/MornGAN_KAIST/. Place the test images of the FLIR dataset in ./KAIST_testsets/test1/. Then run the command

python test_output_only.py --phase test --serial_test --name MornGAN_KAIST --dataroot ./KAIST_testsets/ --n_domains 2 --which_epoch 160 --results_dir ./res_KAIST/ --loadSize 288 --net_Gen_type gen_v1 --no_flip --gpu_ids 0

Training

To reproduce the performance, we recommend that users try multiple training sessions.

1) FLIR

Place the corresponding images in each subfolder of the folder ./FLIR_datasets/. Then run the command

bash ./train_FLIR.sh

2) KAIST

Place the corresponding images in each subfolder of the folder ./KAIST_datasets/. Then run the command

bash ./train_KAIST.sh

Evaluation

1) Semantic segmenation

Download the code for the semantic segmentation model HMSANet and then follow the instructions to install it. Next, download the pre-trained model on the Cityscape dataset, and then change line 52 in the config.py to the path of the folder where these pre-training weights are located. After that, download the segmentation mask and code for both datasets via google drive. Put misc.py in folder ./utils/ and replace the original file, all other files are placed inside the directory /semantic-segmentation-main/. For the evaluation on FLIR dataset, run the command

python -m torch.distributed.launch --nproc_per_node=1 eval_FLIR.py --dataset cityscapes --syncbn --apex --fp16 --eval_folder /Your_FLIR_Results_Path --snapshot /Your_Pretrained_Models_Path/cityscapes_ocrnet.HRNet_Mscale_outstanding-turtle.pth --dump_assets --dump_all_images --result_dir ./Your_FLIR_Mask_SavePath

And for the evaluation on KAIST dataset, run the command

python -m torch.distributed.launch --nproc_per_node=1 eval_KAIST.py --dataset cityscapes --syncbn --apex --fp16 --eval_folder /Your_KAIST_Results_Path --snapshot /Your_Pretrained_Models_Path/cityscapes_ocrnet.HRNet_Mscale_outstanding-turtle.pth --dump_assets --dump_all_images --result_dir ./Your_KAIST_Mask_SavePath

2) Object detection

Download the code for YOLOv7, then follow the instructions to install it. Next, download the YOLOv7 detection txt file we transformed from the FLIR and KAIST datasets via google drive. Once the unzip is complete, place all files in the /yolov7-main/ folder. Note that the files FLIR.yaml, FLIR_imglist.txt, KAIST.yaml and KAIST_imglist.txt should be placed in the directory /yolov7-main/data/. Then, the translation results of FLIR and KAIST should be placed inside the /yolov7-main/FLIR_datasets/images/ and /yolov7-main/KAIST_datasets/images/ directories respectively. For the evaluation on FLIR dataset, run the command

python test.py --data data/FLIR.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights pretrain_weights/yolov7.pt --name FLIR_640_val --verbose

And for the evaluation on KAIST dataset, run the command

python test.py --data data/KAIST.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 2 --weights pretrain_weights/yolov7.pt --name KAIST_640_val --verbose

3) Edge consistency

Please refer to the PearlGAN repository.

Downloading files using Baidu Cloud Drive

If the above Google Drive link is not available, you can try to download the relevant code and files through the Baidu cloud link, extraction code: morn.

Citation

If you like our work and use the code or models for your research, please cite our work as follows.

@article{luo2024memory,
  title={Memory-guided collaborative attention for nighttime thermal infrared image colorization of traffic scenes},
  author={Luo, Fu-Ya and Cao, Yi-Jun and Yang, Kai-Fu and Wang, Gang and Li, Yong-Jie},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2024},
  publisher={IEEE}
}

License

The codes and the pretrained model in this repository are under the BSD 2-Clause "Simplified" license as specified by the LICENSE file.

Acknowledgments

This code is heavily borrowed from ToDayGAN.
Spectral Normalization code is borrowed from BigGAN-PyTorch.

FuyaLuo/MornGAN