williamcfrancis/Visual-Question-Answering-using-Stacked-Attention-Networks
Pytorch implementation of VQA using Stacked Attention Networks: Multimodal architecture for image and question input, using CNN and LSTM, with stacked attention layer for improved accuracy (54.82%). Includes visualization of attention layers. Contributions welcome. Utilizes Visual VQA v2.0 dataset.
Jupyter NotebookApache-2.0
Issues
- 3
Trained model availability?
#2 opened by shruum - 1
trained models
#1 opened by idj3tboy