Official implementation for DeCap

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

Published at ICLR 2023

Paper link: DeCap

Data

Download coco_train to data. Download cc3m_train to data.

Training

./train_coco.sh

./train_cc3m.sh

Inferece

See inference_decap.ipynb.

Pretrained models

Train on coco captions: model_coco

Train on CC3M: Soon

Citation

@inproceedings{lidecap,
  title={DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training},
  author={Li, Wei and Zhu, Linchao and Wen, Longyin and Yang, Yi},
  booktitle={The Eleventh International Conference on Learning Representations}
}

Acknowledgments

This repository is heavily based on ClipCap. For training we used the data of COCO dataset and Conceptual Captions.