/Real-World-VQA

Visual Question Answering (VQA) related to Real World Images by combining convolutional neural networks (CNNs) and transformer-based language models. Our architecture integrates ResNet50 for image feature extraction and BERT for question encoding. We evaluate our model on the processed DAQUAR dataset.

Primary LanguageJupyter Notebook

Stargazers

No one’s star this repository yet.