/Visual-Question-Answering

The VQA system in this project uses a multimodal approach, combining both image and text data to make predictions. Specifically, it uses a convolutional neural network (CNN) to extract features from the image, and a recurrent neural network (RNN) to process the question text. the CNN and RNN outputs are combined to produce the answer.

Primary LanguageJupyter Notebook

Visual-Question-Answering

Description

This repository contains the code for a visual question answering (VQA) system, which is a type of AI model that can answer questions about visual content, such as images.

The VQA system in this project uses a multimodal approach, combining both image and text data to make predictions. Specifically, it uses a convolutional neural network (CNN) to extract features from the image, and a recurrent neural network (RNN) to process the question text. The outputs from these models are then combined and fed into a final classifier to produce an answer.

The system has been trained on the VQA v2 dataset. The code includes data preprocessing, model training, and evaluation scripts, as well as a demo script for running the VQA system on new images and questions.

Dataset

https://visualqa.org/download.html

Usage

To use the VQA system, run the Deployment_Demo.ipynb script

Contributions

Contributions are welcome!