
Pytorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

UMich Fall 2021 EECS 545 final project.

Running guide:

  • Download dataset from COCO2014 or Vizwiz
  • Download the python api of COCO or Vizwiz
  • Run to build the vocabulary and resize the pictures
  • Run to start training
  • Run along with passed-in checkpoint & vocabulary path to do the test. This script can predict a caption given a image, as well as give bleu scores on given validation set.

Our network parameters and vocabulary file for both datasets have been uploaded to Google drive, trained for 3 days on COCO (~30 epoches) and Vizwiz (~100 epoches) using Greatlakes servers.