- This code is a PyTorch implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
- Use Resnet.
- Use Karpathy's train-val-test split
Author's original theano implementation: https://github.com/kelvinxu/arctic-captions
Yunjey's tensorflow implementation: https://github.com/yunjey/show-attend-and-tell
neuraltalk2:https://github.com/karpathy/neuraltalk2
ruotianluo's image captioning code: https://github.com/ruotianluo/ImageCaptioning.pytorch
To use this code, you need to install:
- Python3.6
- PyTorch 0.4 along with torchvision
- matplotlib
- tensorboardX
- numpy
- pycocotools
- imageio
- scikit-image
- h5py
You can use pip to install pycocotools directly or compile from source code. Also download pycocoevalcap code.
First, download the coco images from link. We need 2014 training images and 2014 val. images.
Then, download preprocessed coco captions from link and extract dataset_coco.json
from zip file into data/
.
Then, invode scripts/prepo.py
script, which will create a dataset(an hdf5 file and a json file).
$ python scripts/prepo.py --input_json data/dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk.h5 --word_count_threshold 5 --images_root data
Warning: the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset.
$ python train.py --id st --input_json data/cocotalk.json --input_h5 data/cocotalk.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --checkpoint_path log_st --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 25
The train script will dump checkpoints into the folder specified by --checkpoint_path
(default=save/
). We only save the best-performing checkpoint on validation and the latest checkpoint to save disk space
To resume training, you can specify --start_from
option to be the path saveing infos.pth
and model.pth
If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1
option, but don't forget to download the coco-caption code.
$ python eval.py --model model.pth --infos_path infos.pkl --language_eval 1 --num_images 5000