Computer Vision and NLP Techniques To Generate Image Captions - Theia

The eyes are a person's primary sense organ. A quick glance around us reveals how visual most of the information in our surroundings is. Timetables in train stations, signs showing the correct path or a potential hazard, and a billboard advertising a new product on the market are all examples of visual information that we see on a regular basis. For the blind and visually challenged, most of this information is inaccessible, limiting their independence. Our project is a mobile application to help the visually impaired. It gives a verbal description of one's surroundings, within the scope of the mobile phone’s camera. Image Caption Generator or Photo Descriptions is one of the Applications of Deep Learning. In Which we must pass the image to the model and the model does some processing and generates captions or descriptions as per its training. This prediction is not always accurate and generates some meaningless sentences. We need very high computational power and a very huge dataset for better results. Now we will see what our project does.

With this project, we hope to achieve an abstract explanation of the surroundings to a visually impaired person. Goal:

Detects all objects from a given picture and generates text captions.
Verbal description of all the objects in the surroundings of a blind person.

Model Architecture and working

Relevant screenshots

Note : The desciptions generated are played are read out for the blind.

Future work

Eyes are one of our most important sense organs as it helps us perceive our surroundings. There is a growing issue of people facing issues with their eyesight and becoming visually impaired. Our application will help such people move about in their day to day life with greater ease and also become more independent.

In the future we plan on deploying our model onto a cloud platform to improve its performance. We also plan on making our application in multiple languages allowing people from all over the world to use our application.

vishwajeet-hogale/theia

Computer Vision and NLP Techniques To Generate Image Captions - Theia

Model Architecture and working

Relevant screenshots

Future work