/image_captioner

An image caption service combining a CNN and RNN (image in/text out)

Primary LanguageJupyter Notebook

Image Captioner in Keras

An image caption service combining a convolutional neural network (image-in) and a recurrent neural network (text out). This service uses Keras as the deep learning framework, based on the Flickr8K dataset and evaluated using the BLEU score.

Here's a sample caption produced by the neural network after being fed an image:

sample output image with caption

INSTRUCTIONS

  • Download the Flickr8K Dataset and Flickr8k text
  • Run the extract_features() method in the captioner notebook to derive features.pkl and the method to produce descriptions.txt files. You can run these locally on a CPU, which takes about an hour to go through the 8,092 images from Flickr.
  • You can then copy only the rest of the captioner notebook, features.pkl and descriptions.txt to a cloud platform like AWS, Google Colab or FloydHub to train the model on GPUs. This saves the model to disk, which can then be downloaded and used locally.
  • Run the code in the evaluator and predictor notebook locally with the trained model, evaluating the performance and making new predictions on unseen images on your local drive.

(Of course, you could run everything in the cloud if you wish.)

This demo is based off the wonderful e-book Deep Learning for Natural Language Processing by Dr. Jason Brownlee, and discussed in a couple of blog posts.