/Caption-Generator

LSTM model generates captions for images after extracting features using a CNN model.

Primary LanguageJupyter NotebookMIT LicenseMIT

Image Caption Generator

Image caption generator is a model which generates a caption that describes the contents of the image. It requires a model from computer vision to understand the content of the image and a language model from NLP to translate the understanding of the image into words.

Deep Learning Models have provided an excellent way to get results for examples of this problem.


Dataset:

FLICKR_8K. This dataset includes around images along with 5 different captions written by different people for each image.


The Block diagram of model used in the project



Flow of the Project

1. Cleaning the Captions

2. Extracting Features of image

3. Preprocessing of Image and Text Data

4. Training on LSTM Model

5. Predicting Captions and Evaluating performance


VGG Model Summary

Pre trained VGG16 Model has been used to extract the features of the image.


LSTM Model Summary


Predictions by the model


Evaluation using BLEU score

Good Captions


Bad Captions

Conclusion

The Model has successfully generated captions for images. The performance of the model can be further improved by hyperparameter training.