/Visual-Question-Answering

The VQA system in this project uses a multimodal approach, combining both image and text data to make predictions. Specifically, it uses a convolutional neural network (CNN) to extract features from the image, and a recurrent neural network (RNN) to process the question text. the CNN and RNN outputs are combined to produce the answer.

Primary LanguageJupyter Notebook

Watchers