The purpose of this project is for blinds. Using deep learning we can make a model to generate captions of images. The captions generated by the model are then converted to desired language and give the output as audio using text to speech conversion in the desired language. The computation of this application can be hosted on any cloud platform, we have used azure to host the model. Image caption generator is made in accordance to show, attend and tell paper. For hardware part we have used raspberry pi and a camera for blinds which is better than using mobile phones. The image caption generator for this project is cloned from here.
You can download this pretrained model and the corresponding word_map
here.