/Image-Caption-Generator

A simple implementation of neural image caption generator

Primary LanguagePythonMIT LicenseMIT

Image-Caption-Generator

A simple implementation of neural image caption generator

Please note that the code in this repo is for use in talks/workshops. There is a lot of room for improvement (in terms of both accuracy and efficiency) so that these aspects can be discussed during the sessions.

You may refer to Tensorflow's im2text Model for a stable and accurate implementation.

Setup

Create Directories

  • Run ./scripts/mkdir.sh

Downloading Datasets

  • Run ./scripts/download_images.sh
  • This downloads Flick8K dataset

Downloading Models

  • The VGG16 model would be downloaded automatically when the model is trained for the first time and would be cached on the disk.
  • Alternatively, run python3 vgg16.py. It would download the VGG16 model, produce the embeddings for a test image and compare with a pre-computed embedding.

Processing Images

  • Update data_dir in code/preprocess.py and set mode_list=["train", "test", "debug"]
  • Run python3 preprocess.py

Train

  • Run python3 train.py