
Image Captioning is the process of generating textual description of an image. In this project i have generated Captions for Images using ML, Computer Vision and Deep learning algorithms

About the Dataset

Can download the dataset from here-

  1. Flikr8k_Dataset (Contains Images)
  2. Flikr8k_text (Contains Text) Dataset contains 8000 images, of which 6000 are used for training purpose and remaining for validattion and testing. Each image has almost 5 captions in Flickr8k.txt. This means in total there are 8000*5=40000 captions in the text file Flikr8k_token.txt

Steps taken in the project

1.Data collection
2.Understanding the data
3.Data Cleaning
4.Loading the training set
5.Data Preprocessing — Images
6.Data Preprocessing — Captions
7.Data Preparation using Generator Function
8.Word Embeddings
9.Model Architecture

Model Architecture

alt text

Explaining Encoder and Decoder


alt text For Encoder we use ResNet-50. ResNet-50 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 50 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224.


alt text For Decoder we use LSTM. Long Short-Term Memory (LSTM) networks are a modified version of recurrent neural networks, which makes it easier to remember past data in memory. The vanishing gradient problem of RNN is resolved here. LSTM is well-suited to classify, process and predict time series given time lags of unknown duration. It trains the model by using back-propagation


Following are a few results obtained after training the model for 70 epochs.

Image Caption
Generated Caption: boy in blue swim trunks jumps into pool.
Generated Caption: small dog jumps over an obstacle.
Generated Caption: surfer is falling off wave.
Generated Caption: little girl in pink dress is laying on her head.
Generated Caption: group of children are posing for picture.
Generated Caption: black dog running through snow.
Generated Caption: man on motorcycle riding on road.
Generated Caption: the basketball player in the orange uniform is trying to make shot.
Generated Caption: child in red coat is skiing down snowy hill.
Generated Caption: soccer player in red uniform about to hit ball.