/Image-Captioning

Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Image Captioning using Deep Learning (Pytorch)

image

Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.

Here we have used Convolutional Neural Network for extracting the low level features from images for classifying or comprehending an image.

For NLP , we have used Recurrent Neural Network for processing the captions.

The feature vector from images and word embeddings are fed to LSTM architecture, from where the model has been trained to predict a caption corresponding to a given image.

Model Architecture and Applications

Architecture design :

Image Captioning

Real life example :

images2

Frame work used:

image

Applications:

  1. Assistance for Visually Impaired
  2. Biomedical Image Captioning
  3. Live transcription
  4. Media and Publishing Houses
  5. Social Media Posts

Requirements

  1. Python 3.9+

  2. Jupiter Notebook (/Support)

  3. Deep Learning Frame Works required : Pytorch(1.13.1)

For it's installation refer to the official website : https://pytorch.org/get-started/locally/

And other components like :

A. Numpy 
B. Pandas 
C. Matplotlib
D. Scikit Learn
E. Spacy 
F. NLTK
G. Pillow Library

Run Locally

  1. Clone the project
  git clone https://github.com/Souradeep2233/Image-Captioning.git
  1. Install the dataset from this link: https://www.kaggle.com/datasets/adityajn105/flickr8k

  2. Change the paths refered to dataset materials in the .ipynb file according to running environment platform or local machine.

  3. Wait patiently for the training to be over & eventually our desired image captioning will be achieved.

Authors

This project has been solely developed by Souradeep Dutta , with aim to help visually imapired and further it in biomedical field. I would appreciate all to leave feeback and suggest, to make model's performance better and for mutual development 😊.