Image-Captioning-WebApp is a application that provides details about images feed into it and reads them out.
Install all the dependencies mentioned in requirement.txt. Change the directory to the current folder on powershell (Windows).
python caption.py
- Go to https:// 127.0.0.1:5000/image-caption/ on web browser.
- Upload the required Image.
- Wait for the image to be processed.
The image (jpg format) is uploaded in the web app, which is sent to the backend deep learning model to convert images to text. The text describing the images is then read out so that a person can understand it.
Used a subset of 30,000 captions from the MS-COCO dataset and their corresponding images to train our model. Choosing more data would result in improved captioning quality. Used InceptionV3 (which is pre-trained on Imagenet) to classify each image. Limited the vocabulary size to the top 5,000 words (to save memory). Replaced all other words with the token "UNK" (unknown).
● Extracted the image features from the lower convolutional layer of
InceptionV3 giving us a vector of shape (8, 8, 2048).
● Then squashed that to the shape of (64, 2048).
● This vector is then passed through the CNN Encoder (which consists of a single
Fully connected layer).
● The RNN (here GRU) attends over the image to predict the next word.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.