
Image Caption Model with Attention

Primary LanguagePython




Author's original theano implementation: https://github.com/kelvinxu/arctic-captions

Yunjey's tensorflow implementation: https://github.com/yunjey/show-attend-and-tell


ruotianluo's image captioning code: https://github.com/ruotianluo/ImageCaptioning.pytorch

Getting started


To use this code, you need to install:

  • Python3.6
  • PyTorch 0.4 along with torchvision
  • matplotlib
  • tensorboardX
  • numpy
  • pycocotools
  • imageio
  • scikit-image
  • h5py

You can use pip to install pycocotools directly or compile from source code. Also download pycocoevalcap code.

Prepare Dataset

First, download the coco images from link. We need 2014 training images and 2014 val. images. Then, download preprocessed coco captions from link and extract dataset_coco.json from zip file into data/. Then, invode scripts/prepo.py script, which will create a dataset(an hdf5 file and a json file).

$ python scripts/prepo.py --input_json data/dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk.h5 --word_count_threshold 5 --images_root data

Warning: the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset.

Start training

$ python train.py --id st --input_json data/cocotalk.json --input_h5 data/cocotalk.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --checkpoint_path log_st --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 25

The train script will dump checkpoints into the folder specified by --checkpoint_path(default=save/). We only save the best-performing checkpoint on validation and the latest checkpoint to save disk space

To resume training, you can specify --start_from option to be the path saveing infos.pth and model.pth

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1 option, but don't forget to download the coco-caption code.

Evaluate on Karpathy's test split

$ python eval.py --model model.pth --infos_path infos.pkl --language_eval 1 --num_images 5000