DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Published at ICLR 2023
Paper link: DeCap
Download coco_train to data
.
Download cc3m_train to data
.
./train_coco.sh
or
./train_cc3m.sh
See inference_decap.ipynb
.
Train on coco captions: model_coco
Train on CC3M: Soon
@inproceedings{lidecap,
title={DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training},
author={Li, Wei and Zhu, Linchao and Wen, Longyin and Yang, Yi},
booktitle={The Eleventh International Conference on Learning Representations}
}
This repository is heavily based on ClipCap. For training we used the data of COCO dataset and Conceptual Captions.