/Image-Caption-Generator

A deep learning based image caption generator.

Primary LanguageJupyter Notebook

Image Caption Generator

Given an image, generates a caption for it using two different neural networks; Convolutional Neural Network (CNN) and Long Short Term Memory Network (LSTM).

It uses transfer learning using Xception model to leverage the model's trained parameters to encode an image to a 2048 feature vector which is then fed into an LSTM to predict a caption based on the features extracted by Xception.

Model Architecture




  • We remove the last layer of Xception network
  • Image is fed into this modified network to generate a 2048 length encoding corresponding to it
  • The 2048 length vector is then fed into a second neural network along with a caption for the image (while training)
  • This second network consists of an LSTM which tries to generate a caption for the image

Examples

Here are some captions generated by the network:



References

  • F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 1800-1807, doi: 10.1109/CVPR.2017.195.
    Read Here

  • Hochreiter, Sepp & Schmidhuber, Jürgen. (1997). Long Short-term Memory. Neural computation. 9. 1735-80.
    Read Here

  • Lecun, Yann & Haffner, Patrick & Bengio, Y.. (2000). Object Recognition with Gradient-Based Learning.
    Read Here