Image Captioning using Deep Learning (Pytorch)

Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.

Here we have used Convolutional Neural Network for extracting the low level features from images for classifying or comprehending an image.

For NLP , we have used Recurrent Neural Network for processing the captions.

The feature vector from images and word embeddings are fed to LSTM architecture, from where the model has been trained to predict a caption corresponding to a given image.

Model Architecture and Applications

Architecture design :

Real life example :

Frame work used:

Applications:

Assistance for Visually Impaired
Biomedical Image Captioning
Live transcription
Media and Publishing Houses
Social Media Posts

Requirements

Python 3.9+
Jupiter Notebook (/Support)
Deep Learning Frame Works required : Pytorch(1.13.1)

For it's installation refer to the official website : https://pytorch.org/get-started/locally/

And other components like :

A. Numpy 
B. Pandas 
C. Matplotlib
D. Scikit Learn
E. Spacy 
F. NLTK
G. Pillow Library

Run Locally

Clone the project

  git clone https://github.com/Souradeep2233/Image-Captioning.git

Install the dataset from this link: https://www.kaggle.com/datasets/adityajn105/flickr8k
Change the paths refered to dataset materials in the .ipynb file according to running environment platform or local machine.
Wait patiently for the training to be over & eventually our desired image captioning will be achieved.

Authors

This project has been solely developed by Souradeep Dutta , with aim to help visually imapired and further it in biomedical field. I would appreciate all to leave feeback and suggest, to make model's performance better and for mutual development 😊.