WahajAB/Real-World-VQA
Visual Question Answering (VQA) related to Real World Images by combining convolutional neural networks (CNNs) and transformer-based language models. Our architecture integrates ResNet50 for image feature extraction and BERT for question encoding. We evaluate our model on the processed DAQUAR dataset.
Jupyter Notebook
Stargazers
No one’s star this repository yet.