/ModeCap

Controllable mage captioning model with unsupervised modes

Primary LanguagePython

Learning Distinct and Representative Modes for Image Captioning (Neurips 2022)

This repo provides the implemetation of the paper Learning Distinct and Representative Modes for Image Captioning.

Install

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
pip install transformers yacs scipy

Data

Follow the instructions in VLP.

Run

python -m modecap.train data_dir PATH_TO_DATA
python -m modecap.inference data_dir PATH_TO_DATA model_path PATH_TO_MODEL