Hazem-Abbas/Visual-Question-Answering

The VQA system in this project uses a multimodal approach, combining both image and text data to make predictions. Specifically, it uses a convolutional neural network (CNN) to extract features from the image, and a recurrent neural network (RNN) to process the question text. the CNN and RNN outputs are combined to produce the answer.

Jupyter Notebook

Watchers

drkostas
University of Tennessee, Knoxville
Hazem-Abbas
EL-Qalyubeia/Egypt