/MacCap

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Primary LanguagePython

MacCap

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Overview

Setup

First, download and set up the repo:

git clone https://github.com/Artanic30/MacCap
cd MacCap
conda env create -f environment.yml
conda activate MacCap

Data preparation

Download coco_train to data. Download cc3m_train to data.

Training

./train_coco.sh

or

./train_cc3m.sh

Evaluation

Follow the instruction here to evaluate generated captions.

Citation

@article{qiu2024mining,
  title={Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training},
  author={Qiu, Longtian and Ning, Shan and He, Xuming},
  journal={arXiv preprint arXiv:2401.02347},
  year={2024}
}

Acknowledgments

This repository is heavily based on ClipCap, DeCap. For training we used the data of COCO dataset and Conceptual Captions.

Release Schedule

  • Initial Code release
  • Detail Document
  • Data Preparation
  • Training and Evaluation Scripts
  • Checkpoints