The project is aimed at developing a system using deep learning techniques to assist visually impaired individuals in obtaining information by describing images taken by them.
The primary goal of the project is to develop a system using deep learning techniques to assist visually impaired individuals in obtaining information by describing images taken by them. The system uses a CNN model and an NLP model to create a single image captioning system that takes image features as input and generates a text sequence describing the image.
Incorporated state-of-the-art pre-trained models, such as ResNet50, VGG16, and VGG19, for image feature extraction and LSTM and Bidirectional LSTM for text generation. Evaluated various models to determine the best-performing model with a BLEU-score of 0.61 and deployed it using Flask and pyttsx3 for web and text-to-speech functionality in the app.