/Food-IAC

Code for paper "To be an Artist: Automatic Generation on Food Image Aesthetic Captioning" (ICTAI 2020 Oral)

Primary LanguagePython

Automatic Generation on Food Image Aesthetic Captioning

PyTorch implementation for paper To be an Artist: Automatic Generation on Food Image Aesthetic Captioning (ICTAI 2020).

Code is provided as-is, no updates expected.

model overview

 

Requirements

Make sure your environment is installed with:

  • Python 3.5+
  • java 1.8.0 (for computing METEOR and SPICE)

Then install requirements:

pip install -r requirements.txt

 

Usage

Configuration

Hyperparameters and options can be configured in config.py, see this file for more details.

Preprocess

Preprocess the images along with their captions and store them locally:

python preprocess.py

Single-Aspect Captioning Module

Single-Aspect Captioning Module is guaranteed to generate the captions and learn the feature representations of each aesthetic attribute.

To run train:

python single_train.py

To run test and compute metrics, edit beam_size in single_test.py, then:

python single_test.py

To run inference, edit image_path and beam_size in single_infer.py, then:

python single_infer.py

Multi-Aspect Captioning Module

Multi-Aspect Captioning Module is supposed to study the associations among all feature representations and automatically aggregate captions of all aesthetic attributes to a final sentence.

To run train:

python multi_train.py

To run test and compute metrics, edit model_path and multi_beam_k in multi_test.py, then:

python multi_test.py

To run inference, edit image_path and multi_beam_k in multi_infer.py, then:

python multi_infer.py

 

Dataset

A dataset for food image aesthetic captioning was constructed to evaluate the proposed method, see here for details.

 

NOTES

  • Followed the experiment settings in a previous work, we pre-trained our single-aspect captioning module on the MSCOCO image captioning dataset first, and then fine-tuned on our dataset.
  • The load_embeddings method (in src/utils/embedding.py) will try to create a cache for loaded embeddings under folder dataset_output_path. This dramatically speeds up the loading time the next time.
  • You will first need to download the Stanford CoreNLP 3.6.0 code and models for use by SPICE. To do this, run: cd src/metrics && bash get_stanford_models.sh.

 

Acknowledgements