Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.
Here we have used Convolutional Neural Network for extracting the low level features from images for classifying or comprehending an image.
For NLP , we have used Recurrent Neural Network for processing the captions.
The feature vector from images and word embeddings are fed to LSTM architecture, from where the model has been trained to predict a caption corresponding to a given image.
Architecture design :
Real life example :
Frame work used:
Applications:
- Assistance for Visually Impaired
- Biomedical Image Captioning
- Live transcription
- Media and Publishing Houses
- Social Media Posts
-
Python 3.9+
-
Jupiter Notebook (/Support)
-
Deep Learning Frame Works required : Pytorch(1.13.1)
For it's installation refer to the official website : https://pytorch.org/get-started/locally/
And other components like :
A. Numpy
B. Pandas
C. Matplotlib
D. Scikit Learn
E. Spacy
F. NLTK
G. Pillow Library
- Clone the project
git clone https://github.com/Souradeep2233/Image-Captioning.git
-
Install the dataset from this link: https://www.kaggle.com/datasets/adityajn105/flickr8k
-
Change the paths refered to dataset materials in the .ipynb file according to running environment platform or local machine.
-
Wait patiently for the training to be over & eventually our desired image captioning will be achieved.
This project has been solely developed by Souradeep Dutta , with aim to help visually imapired and further it in biomedical field. I would appreciate all to leave feeback and suggest, to make model's performance better and for mutual development 😊.