About The Project

This is an image captioning deep learning model, which returns a single line description of the image fed to it.

Built With

Encoder-Decoder architecture.
Transfer Learning
Beam Search.
Flicker8k dataset, used this dataset as was most feasible due to its smaller size comparing to COCOMO dataset.

The loss value of 4.8987 has been achieved which gives okayish results. Everything is implemented in the Jupyter notebook which will hopefully make it easier to understand the code.

Dependencies

Keras 1.2.2
Tensorflow 0.12.1
tqdm
numpy
pandas
matplotlib
pickle
PIL
glob

vishaljha2121/ImageCaptioning

About The Project

Built With

Dependencies