/TextFusion

(2025' Information Fusion) This is the offical implementation for the paper titled "TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion".

Primary LanguageJupyter NotebookMIT LicenseMIT

TextFusion

This is the offical implementation for the paper titled "TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion". Paper Link

"To generate appropriate fusion results for a specific scenario, existing methods cannot realize it or require expensive retraining. The same goal can be achieved by simply adjusting the focused objectives of textual description in our paradigm."

Highlight

  • For the first time, the text modality is introduced to the image fusion field.
  • A benchmark dataset.
  • A textual attention assessment.

IVT dataset

"Statistic information of the proposed dataset."

Train Set [Images&Text]: Google Drive

Train Set [Pre-gen Association Maps]: Google Drive

Test Set: Google Drive

The propose model

Folder structure:

/dataset
--/IVT_train
----/ir
------/1.png
----/vis
------/1.png
----/text
------/1_1.txt
----/association
------/IVT_LLVIP_2000_imageIndex_1_textIndex_1
--------/Final_Finetuned_BinaryInterestedMap.png
/TextFusion
--/main_trainTextFusion.py
--/net.py
--/main_test_rgb_ir.py

To train

Assuming that you already have (download from above links) the pre-gen association map, images, and corresponding textual description in the "IVT_train" folder.

(The code to generate the association map on your own is coming soon)

Simply run the following prompt to start the training process:

python main_trainTextFusion.py

The trained models and corresponding loss values will be saved in the "models" folder.

To test

For the RGB and infrared image fusion (e.g., LLVIP):

python main_test_rgb_ir.py

Tips: If you are comparing our TextFusion with a pure apperance-based method, you can directly set the "description" as empty for a relative fair experiment.

For the grayscale and infrared image fusion (e.g., TNO):

python main_test_gray_ir.py

Environment

  • Python 3.8.3
  • Torch 2.1.1
  • torchvision 0.16.1
  • opencv-python 4.8.1.78

Update

  • 2024-3-14: The training code is available and corresponding pre-gen association maps are uploaded to the Google Drive.
  • 2024-3-5: The testing set of our IVT dataset is available now.
  • 2024-2-8: The training set of our IVT dataset is available now.
  • 2024-2-12: The pre-trained model and test files are available now!

Citation

If this work is helpful to you, please cite it as:

@article{cheng2023textfusion,
  title={TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion},
  author={Cheng, Chunyang and Xu, Tianyang and Wu, Xiao-Jun and Li, Hui and Li, Xi and Tang, Zhangyong and Kittler, Josef},
  journal={arXiv preprint arXiv:2312.14209},
  year={2023}
}

Our dataset is annotated based on the LLVIP dataset:

@inproceedings{jia2021llvip,
  title={LLVIP: A visible-infrared paired dataset for low-light vision},
  author={Jia, Xinyu and Zhu, Chuang and Li, Minzhen and Tang, Wenqi and Zhou, Wenli},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={3496--3504},
  year={2021}
}