Image Captioning

Implementation of a Image Captioning Model using Transfer Learning and Recurrent Neural Networks

Requirements

The following libraries are required to run the code :

Tensorflow
Keras
Numpy
Pandas
OpenCV
Matplotlib
Tqdm

The above mentioned packages can be installed by executing pip install -r requirements.txt

Dataset

The neural network has been trained on Flicker8K Dataset. This dataset has been obtained from Kaggle. The link to this dataset is given below. Please download the dataset from the given link and place them in your project directory.

Link : https://www.kaggle.com/ming666/flicker8k-dataset

Training

Training the neural network takes a really long time. Training depends on your system's capability. The neural network has been trained on Nvidia Tesla K80 GPUs for about 3 days. This could vary from system to system. It is recommended to use a GPU of some sort to train this model.

I would like to thank Kaggle for letting me use their GPUs for training my neural network.

Contribution

Contributers are free to make changes in the code either to imporve it or correct some of the bugs that may be present in the code. Please make a pull request before you edit the code. Avoid making changes in the master branch.

For downloading the weights for the Neural Network, please contact me at my email address given below

Email ID : vishalramesh01[at]gmail.com

NOTE : The model weights will not be given to everyone. The chances of getting one will be based on your contributions to the project.

Results

Generated Caption : A man is standing on a snow covered mountain.

Generated Caption : A brown dog is running towards the camera with a stick in its mouth.

References

The following resources were utilised in developing this project.

MIT Deep Learning Basics: Introduction and Overview

https://youtu.be/O5xeyoRL95U
CS231n Winter - 2016, Stanford Univeristy Lecture - 10 : Reccurent Neural Networks and Image Captioning, LSTM by Andrej Karpathy.

https://youtu.be/yCC09vCHzF8
Training Neural Networks, Stanford University

https://youtu.be/wEoyxE0GP2M
Andrej Karpathy's Neuraltalk2

https://github.com/karpathy/neuraltalk2
Automated Image Captioning using ConvNets and Reccurent Nets By Andrej Karpathy and Fei - Fei Li

https://cs.stanford.edu/people/karpathy/sfmltalk.pdf

iVishalr/Image-Captioning