This repository contains the code for a visual question answering (VQA) system, which is a type of AI model that can answer questions about visual content, such as images.
The VQA system in this project uses a multimodal approach, combining both image and text data to make predictions. Specifically, it uses a convolutional neural network (CNN) to extract features from the image, and a recurrent neural network (RNN) to process the question text. The outputs from these models are then combined and fed into a final classifier to produce an answer.
The system has been trained on the VQA v2 dataset. The code includes data preprocessing, model training, and evaluation scripts, as well as a demo script for running the VQA system on new images and questions.
https://visualqa.org/download.html
To use the VQA system, run the Deployment_Demo.ipynb script
Contributions are welcome!