/CoMat

Official code for 💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Primary LanguagePython

💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Official repository for the paper "CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching".

🌟 For more details, please refer to the project page: https://caraj7.github.io/comat/.

[🌐 Webpage] [📖 Paper]

💥 News

  • [2024.04.05] 🚀 We release our paper on arXiv.

📌 TODO

  • Release training code in April.

👀 About CoMat

We propose 💫CoMat, an end-to-end diffusion model fine-tuning strategy with an image-to-text concept matching mechanism. We leverage an image captioning model to measure image-to-text alignment and guide the diffusion model to revisit ignored tokens.

demo

✅ Citation

If you find CoMat useful for your research and applications, please kindly cite using this BibTeX:

@inproceedings{jiang2024comat,
  title={CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching},
  author={Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li},
  booktitle={arXiv},
  year={2024}
}