Awesome Visual Question Answering:
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Contributing
Please feel free to send me pull requests or email (leungjokie@gmail.com) to add links. Markdown format:
- [Paper Name](link) - Author 1 et al, **Conference Year**. [[code]](link)
Change Log
- Mar.3rd,2019 The First version released.
Table of Contents
- Contributing
- Change Log
- Table of Contents
- Papers
- VQA Challenge Leaderboard
- Licenses
- Reference and Acknowledgement
Papers
Survey
- Visual question answering: Datasets, algorithms, and future challenges - Kushal Kafle et al, CVIU 2017.
- Visual question answering: A survey of methods and datasets - Qi Wu et al, CVIU 2017.
2019
CVPR 2019
- Information Maximizing Visual Question Generation - Ranjay Krishna et al, CVPR 2019. [code]
- Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence - Amir Zadeh et al, CVPR 2019. [code]
- Learning to Compose Dynamic Tree Structures for Visual Contexts - Kaihua Tang et al, CVPR 2019. [code]
- Transfer Learning via Unsupervised Task Discovery for Visual Question Answering - Hyeonwoo Noh et al, CVPR 2019. [code]
- Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph - Yao-Hung Hubert Tsai et al, CVPR 2019. [code]
- Explainable and Explicit Visual Reasoning over Scene Graphs - Jiaxin Shi et al, CVPR 2019. [code]
- MUREL: Multimodal Relational Reasoning for Visual Question Answering - Remi Cadene et al, CVPR 2019. [code]
- Image-Question-Answer Synergistic Network for Visual Dialog - Dalu Guo et al, CVPR 2019. [code]
- RAVEN: A Dataset for Relational and Analogical Visual rEasoNing - Chi Zhang et al, CVPR 2019. [project page]
AAAI 2019
- Differential Networks for Visual Question Answering - Chenfei Wu et al, AAAI 2019. [code]
- BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection - Hedi Ben-younes et al, AAAI 2019. [code]
- Dynamic Capsule Attention for Visual Question Answering - Yiyi Zhou et al, AAAI 2019. [code]
- Structured Two-stream Attention Network for Video Question Answering - Lianli Gao et al, AAAI 2019. [code]
- Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering - Xiangpeng Li et al, AAAI 2019. [code]
- WK-VQA: World Knowledge-enabled Visual Question Answering - Sanket Shah et al, AAAI 2019. [code]
- Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning - Yiyi Zhou et al, AAAI 2019. [code]
OTHER
- Focal Visual-Text Attention for Memex Question Answering - Junwei Liang et al, TPAMI 2019. [code]
- Combining Multiple Cues for Visual Madlibs Question Answering - Tatiana Tommasi et al, IJCV 2019. [code]
- Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation - Sang-Woo Lee et al, ICLR 2019. [code]
2018
NIPS 2018
- Bilinear Attention Networks - Jin-Hwa Kim et al, NIPS 2018. [code]
- Chain of Reasoning for Visual Question Answering - Chenfei Wu et al, NIPS 2018. [code]
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering - Will Norcliffe-Brown et al, NIPS 2018. [code]
- Learning to Specialize with Knowledge Distillation for Visual Question Answering - Jonghwan Mun et al, NIPS 2018. [code]
- Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering - Medhini Narasimhan et al, NIPS 2018. [code]
- Overcoming Language Priors in Visual Question Answering with Adversarial Regularization - Sainandan Ramakrishnan et al, NIPS 2018. [code]
AAAI 2018
- Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering - Somak Aditya et al, AAAI 2018. [code]
- Co-Attending Free-Form Regions and Detections with Multi-Modal Multiplicative Feature Embedding for Visual Question Answering - Pan Lu et al, AAAI 2018. [code]
- Exploring Human-Like Attention Supervision in Visual Question Answering - Somak Aditya et al, AAAI 2018. [code]
- Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents - Bo Wang et al, AAAI 2018. [code]
IJCAI 2018
- Feature Enhancement in Attention for Visual Question Answering - Yuetan Lin et al, IJCAI 2018. [code]
- A Question Type Driven Framework to Diversify Visual Question Generation - Zhihao Fan et al, IJCAI 2018. [code]
- Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network - Zhou Zhao et al, IJCAI 2018. [code]
- Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks - Zhou Zhao et al, IJCAI 2018. [code]
CVPR 2018
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Peter Anderson et al, CVPR 2018. [code(author)] [code(pythiaV0.1)] [code(Pytorch Reimplementation)]
- Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge - Damien Teney et al, CVPR 2018. [code]
- Learning by Asking Questions - Ishan Misra et al, CVPR 2018. [code]
- Embodied Question Answering - Abhishek Das et al, CVPR 2018. [code]
- VizWiz Grand Challenge: Answering Visual Questions From Blind People - Danna Gurari et al, CVPR 2018. [code]
- Textbook Question Answering Under Instructor Guidance With Memory Networks - Juzheng Li et al, CVPR 2018. [code]
- IQA: Visual Question Answering in Interactive Environments - Daniel Gordon et al, CVPR 2018. [sample video]
- Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - Aishwarya Agrawal et al, CVPR 2018. [code]
- Learning Answer Embeddings for Visual Question Answering - Hexiang Hu et al, CVPR 2018. [code]
- DVQA: Understanding Data Visualizations via Question Answering - Kushal Kafle et al, CVPR 2018. [code]
- Cross-Dataset Adaptation for Visual Question Answering - Wei-Lun Chao et al, CVPR 2018. [code]
- Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering - Unnat Jain et al, CVPR 2018. [code]
- Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering - Duy-Kien Nguyen et al, CVPR 2018. [code]
- Visual Question Generation as Dual Task of Visual Question Answering - Yikang Li et al, CVPR 2018. [code]
- Focal Visual-Text Attention for Visual Question Answering - Junwei Liang et al, CVPR 2018. [code]
- Motion-Appearance Co-Memory Networks for Video Question Answering - Jiyang Gao et al, CVPR 2018. [code]
- Visual Question Answering With Memory-Augmented Networks - Chao Ma et al, CVPR 2018. [code]
- Visual Question Reasoning on General Dependency Tree - Qingxing Cao et al, CVPR 2018. [code]
- Differential Attention for Visual Question Answering - Badri Patro et al, CVPR 2018. [code]
- Learning Visual Knowledge Memory Networks for Visual Question Answering - Zhou Su et al, CVPR 2018. [code]
- IVQA: Inverse Visual Question Answering - Feng Liu et al, CVPR 2018. [code]
- Customized Image Narrative Generation via Interactive Visual Question Generation and Answering - Andrew Shin et al, CVPR 2018. [code]
ACM MM 2018
- Object-Difference Attention: A simple relational attention for Visual Question Answering - Chenfei Wu et al, ACM MM 2018. [code]
- Enhancing Visual Question Answering Using Dropout - Zhiwei Fang et al, ACM MM 2018. [code]
- Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering - Xuanyi Dong et al, ACM MM 2018. [code]
- Explore Multi-Step Reasoning in Video Question Answering - Xiaomeng Song et al, ACM MM 2018. [code] [SVQA dataset]
ECCV 2018
- Visual Question Answering as a Meta Learning Task - Damien Teney et al, ECCV 2018. [code]
- Question-Guided Hybrid Convolution for Visual Question Answering - Peng Gao et al, ECCV 2018. [code]
- Goal-Oriented Visual Question Generation via Intermediate Rewards - Junjie Zhang et al, ECCV 2018. [code]
- Multimodal Dual Attention Memory for Video Story Question Answering - Kyung-Min Kim et al, ECCV 2018. [code]
- A Joint Sequence Fusion Model for Video Question Answering and Retrieval - Youngjae Yu et al, ECCV 2018. [code]
- Deep Attention Neural Tensor Network for Visual Question Answering - Yalong Bai et al, ECCV 2018. [code]
- Question Type Guided Attention in Visual Question Answering - Yang Shi et al, ECCV 2018. [code]
- Learning Visual Question Answering by Bootstrapping Hard Attention - Mateusz Malinowski et al, ECCV 2018. [code]
- Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering - Medhini Narasimhan et al, ECCV 2018. [code]
- Visual Question Generation for Class Acquisition of Unknown Objects - Kohei Uehara et al, ECCV 2018. [code]
OTHER
- Image Captioning and Visual Question Answering Based on Attributes and External Knowledge - Qi Wu et al, TPAMI 2018. [code]
- FVQA: Fact-Based Visual Question Answering - Peng Wang et al, TPAMI 2018. [code]
- R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering - Pan Lu et al, SIGKDD 2018. [code(Dataset)]
- Interpretable Counting for Visual Question Answering - Alexander Trott et al, ICLR 2018. [code]
- Learning to Count Objects in Natural Images for Visual Question Answering - Yan Zhang et al, ICLR 2018. [code]
- A Better Way to Attend: Attention With Trees for Video Question Answering - Hongyang Xue et al, TIP 2018. [code]
- Zero-Shot Transfer VQA Dataset - Pan Lu et al, arxiv preprint. [code]
2017-2015
OTHER
Please check the other papers list from VQA area between 2017-2015 in awesome-vqa from JamesChuanggg,it seems that he hasn't maintained that project for a long time.Really appreciate for his work.I will merge his work to this list in the future.Stay tuned...
ICCV 2017
- Learning to Reason: End-to-End Module Networks for Visual Question Answering - Ronghang Hu et al, ICCV 2017. [code]
- Structured Attentions for Visual Question Answering - Chen Zhu et al, ICCV 2017. [code]
- VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation - Chuang Gan et al, ICCV 2017. [code]
- Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering - Zhou Yu et al, ICCV 2017. [code]
- An Analysis of Visual Question Answering Algorithms - Kushal Kafle et al, ICCV 2017. [code]
- MUTAN: Multimodal Tucker Fusion for Visual Question Answering - Hedi Ben-younes et al, ICCV 2017. [code]
- MarioQA: Answering Questions by Watching Gameplay Videos - Jonghwan Mun et al, ICCV 2017. [code]
- Learning to Disambiguate by Asking Discriminative Questions - Yining Li et al, ICCV 2017. [code]
VQA Challenge Leaderboard
I will collect the leaderboard's implementations in the future.Stay tuned...
test-std 2018
test-std 2017
Licenses
To the extent possible under law, Jokie Leung has waived all copyright and related or neighboring rights to this work.
Reference and Acknowledgement
Really appreciate for there contributions in this area.