A reading list of papers about Visual Question Answering.
- GQA [2019][CVPR] GQA, A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.[paper][dataset][中文解读]
- VQA-CP [2018][CVPR] Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering.[paper][dataset][中文解读]
- VQA v2.0 [2017][CVPR] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering.[paper][dataset]
- Visual7W [2016][CVPR] Visual7W: Grounded Question Answering in Images.[paper][dataset][中文解读]
- SHAPES [2016][CVPR] Neural Module Networks.[paper][dataset][中文解读]
- FM-IQA [2015][NIPS] Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering.[paper][dataset][中文解读]
- VQA v1.0 [2015][ICCV] VQA, Visual Question Answering.[paper][dataset][中文解读]
- Visual Madlibs [2015][ICCV] Visual Madlibs Fill in the Blank Description Generation and Question Answering.[paper][dataset][中文解读]
- DAQUAR-Consensus [2015][ICCV] Ask Your Neurons A Neural-Based Approach to Answering Questions About Images.[paper][dataset][中文解读]
- DAQUAR [2014][NIPS] A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input.[paper][dataset][中文解读]
- [2021][AAAI] Regularizing Attention Networks for Anomaly Detection in Visual Question Answering.[paper][中文解读]
- [2021][CVPR] Causal Attention for Vision-Language Tasks.[paper][中文解读]
- [2021][CVPR] Counterfactual VQA: A Cause-Effect Look at Language Bias.[paper]
- [2021][CVPR] Domain-robust VQA with diverse datasets and methods but no target labels.[paper][中文解读]
- [2021][CVPR] Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules.[paper]
- [2021][CVPR] Separating Skills and Concepts for Novel Visual Question Answering.[paper]
- [2021][CVPR] Transformation Driven Visual Reasoning.[paper]
- [2021][CVPR] Predicting Human Scanpaths in Visual Question Answering.[paper]
- [2021][CVPR] Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing.[paper]
- [2021][CVPR] Roses are Red, Violets are Blue... But Should VQA expect Them To?.[paper]
- [2021][CVPR] How Transferable are Reasoning Patterns in VQA?.[paper]
- [2021][CVPR] KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.[paper]
- [2021][CVPR] TAP: Text-Aware Pre-training for Text-VQA and Text-Caption.[paper]
- [2021][ICCV] Greedy Gradient Ensemble for Robust Visual Question Answering.[paper]
- [2021][ICCV] Auto-Parsing Network for Image Captioning and Visual Question Answering.[paper]
- [2021][ICCV] Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering.[paper]
- [2021][ICCV] Linguistically Routing Capsule Network for Out-of-distribution Visual Question Answering.[paper]
- [2021][ICCV] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering.[paper]
- [2021][ICCV] Unshuffling Data for Improved Generalization in Visual Question Answering.[paper]
- [2021][TIP] Re-Attention for Visual Question Answering.[paper]
- [2021][SIGIR] LPF, A Language-Prior Feedback Objective Function for De-biased Visual Question Answering.[paper]
- [2021][ICCV] .[paper]
- [2020][arXiv] KRISP Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.[paper]
- [2020][AAAI] Overcoming Language Priors in VQA via Decomposed Linguistic Representations.[paper]
- [2020][ACL] Cross-Modality Relevance for Reasoning on Language and Vision.[paper][中文解读]
- [2020][CVPR] Counterfactual Samples Synthesizing for Robust Visual Question Answering.[paper]
- [2020][CVPR] Counterfactual Vision and Language Learning.[paper]
- [2020][CVPR] Fantastic Answers and Where to Find Them Immersive Question-Directed Visual Attention.[paper]
- [2020][CVPR] Hypergraph Attention Networks for Multimodal Learning.[paper]
- [2020][CVPR] In Defense of Grid Features for Visual Question Answering.[paper]
- [2020][CVPR]Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text.[paper][中文解读]
- [2020][CVPR] On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering.[paper]
- [2020][CVPR] SQuINTing at VQA Models, Introspecting VQA Models With Sub-Questions.[paper]
- [2020][CVPR] TA-Student VQA, Multi-Agents Training by Self-Questioning.[paper]
- [2020][CVPR] Towards Causal VQA Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing.[paper]
- [2020][CVPR] Visual Commonsense R-CNN.[paper]
- [2020][CVPR] VQA with No Questions-Answers Training.[paper]
- [2020][ECCV][oral] A Competence-aware Curriculum for Visual Concepts Learning via Question Answering.[paper]
- [2020][ECCV][poster] Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision.[paper]
- [2020][ECCV][poster] Multi-Agent Embodied Question Answering in Interactive Environments.[paper]
- [2020][ECCV][poster] Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder.[paper]
- [2020][ECCV][poster] Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering.[paper]
- [2020][ECCV][poster] TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering.[paper][中文解读]
- [2020][ECCV][poster] Visual Question Answering on Image Sets.[paper]
- [2020][ECCV][poster] VQA-LOL: Visual Question Answering under the Lens of Logic.[paper]
- [2020][EMNLP] MUTANT: A Training Paradigm for Out-of-Distribution Generalization in VQA.[paper][中文解读]
- [2020][IJCAI] Mucko, Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering.[paper]
- [2020][NeurIPS] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering.[paper][中文解读]
- [2020][TMM] Self-Adaptive Neural Module Transformer for Visual Question Answering.[paper]
- [2019][AAAI] BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection.[paper]
- [2019][AAAI] Lattice CNNs for Matching Based Chinese Question Answering.[paper]
- [2019][AAAI] TallyQA Answering Complex Counting Questions.[paper]
- [2019][ACMMM] Perceptual Visual Reasoning with Knowledge Propagation.[paper]
- [2019][CVPR] Cycle-Consistency for Robust Visual Question Answering.[paper]
- [2019][CVPR] Embodied Question Answering in Photorealistic Environments with Point Cloud Perception.[paper]
- [2019][CVPR] Explainable and Explicit Visual Reasoning over Scene Graphs.[paper]
- [2019][CVPR] GQA, A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.[paper][中文解读]
- [2019][CVPR] It’s not about the Journey; It’s about the Destination Following Soft Paths under Question-Guidance for Visual Reasoning.[paper]
- [2019][CVPR] MUREL, Multimodal Relational Reasoning for Visual Question Answering.[paper]
- [2019][CVPR] Towards VQA Models That Can Read.[paper][中文解读]
- [2019][CVPR] Transfer Learning via Unsupervised Task Discovery for Visual Question Answering.[paper][中文解读]
- [2019][CVPR] Visual Question Answering as Reading Comprehension.[paper][中文解读]
- [2019][ICCV] Compact Trilinear Interaction for Visual Question Answering.[paper][中文解读]
- [2019][ICCV] Language-Conditioned Graph Networks for Relational Reasoning.[paper][中文解读]
- [2019][ICCV] Multi-modality Latent Interaction Network for Visual Question Answering.[paper][中文解读]
- [2019][ICCV] Relation-Aware Graph Attention Network for Visual Question Answering.[paper][中文解读]
- [2019][ICCV] Scene Text Visual Question Answering.[paper][中文解读]
- [2019][ICCV] Taking a HINT Leveraging Explanations to Make Vision and Language Models More Grounded.[paper][中文解读]
- [2019][ICCV] U-CAM, Visual Explanation using Uncertainty based Class Activation Maps.[paper][中文解读]
- [2019][ICCV] Why Does a Visual Question Have Different Answers.[paper][中文解读]
- [2019][ICLR] THE NEURO-SYMBOLIC CONCEPT LEARNER: INTERPRETING SCENES, WORDS, AND SENTENCES FROM NATURAL SUPERVISION.[paper][中文解读]
- [2019][ICLR] Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering.[paper][中文解读]
- [2019][ICLR] Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering.[paper][中文解读]
- [2019][ICLR] VISUAL REASONING BY PROGRESSIVE MODULE NETWORKS.[paper][中文解读]
- [2019][ICML] Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering.[paper][中文解读]
- [2019][NeurIPS] Analyzing Compositionality of Visual Question Answering.[paper](not found)[中文解读]
- [2019][NeurIPS] Heterogeneous Graph Learning for Visual Commonsense Reasoning.[paper][中文解读]
- [2019][NeurIPS] Learning by Abstraction The Neural State Machine.[paper][中文解读]
- [2019][NeurIPS] Learning Dynamics of Attention Human Prior for Interpretable Machine Reasoning.[paper][中文解读]
- [2019][NeurIPS] RUBi: Reducing Unimodal Biases in Visual Question Answering.[paper][中文解读]
- [2019][NeurIPS] Self-Critical Reasoning for Robust Visual Question Answering.[paper][中文解读]
- [2019][NeurIPS] Visual Concept-Metaconcept Learning.[paper][中文解读]
- [2018][AAAI] Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering.[paper][中文解读]
- [2018][AAAI] Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering.[paper][中文解读]
- [2018][AAAI] Exploring Human-like Attention Supervision in Visual Question Answering.[paper][中文解读]
- [2018][AAAI] Movie Question Answering Remembering the Textual Cues for Layered Visual Contents.[paper][中文解读]
- [2018][CVPR] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Cross-Dataset Adaptation for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Customized Image Narrative Generation via Interactive Visual Question Generation and Answering.[paper][中文解读]
- [2018][CVPR] Differential Attention for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Don’t Just Assume; Look and Answer Overcoming Priors for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] DVQA Understanding Data Visualizations via Question Answering.[paper][中文解读]
- [2018][CVPR] Embodied Question Answering.[paper][中文解读]
- [2018][CVPR] Focal Visual-Text Attention for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] IQA Visual Question Answering in Interactive Environments.[paper][中文解读]
- [2018][CVPR] iVQA Inverse Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Learning Answer Embeddings for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Learning by Asking Questions.[paper][中文解读]
- [2018][CVPR] Learning Visual Knowledge Memory Networks for Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Multimodal Explanations Justifying Decisions and Pointing to the Evidence.[paper][中文解读]
- [2018][CVPR] Textbook Question Answering under Instructor Guidance with Memory Networks.[paper][中文解读]
- [2018][CVPR] Tips and Tricks for Visual Question Answering Learnings from the 2017 Challenge.[paper][中文解读]
- [2018][CVPR] Transparency by Design Closing the Gap Between Performance and Interpretability in Visual Reasoning.[paper][中文解读]
- [2018][CVPR] Two can play this Game Visual Dialog with Discriminative Question Generation and Answering.[paper][中文解读]
- [2018][CVPR] Visual Question Answering with Memory-Augmented Networks.[paper][中文解读]
- [2018][CVPR] Visual Question Generation as Dual Task of Visual Question Answering.[paper][中文解读]
- [2018][CVPR] Visual Question Reasoning on General Dependency Tree.[paper][中文解读]
- [2018][CVPR] VizWiz Grand Challenge Answering Visual Questions from Blind People.[paper][中文解读]
- [2018][ECCV] A Dataset and Architecture for Visual Reasoning with a Working Memory.[paper][中文解读]
- [2018][ECCV] Deep Attention Neural Tensor Network for Visual Question Answering.[paper][中文解读]
- [2018][ECCV] Explainable Neural Computation via Stack Neural Module Networks.[paper][中文解读]
- [2018][ECCV] Goal-Oriented Visual Question Generation via Intermediate Rewards.[paper][中文解读]
- [2018][ECCV] Grounding Visual Explanations.[paper][中文解读]
- [2018][ECCV] Learning Visual Question Answering by Bootstrapping Hard Attention.[paper][中文解读]
- [2018][ECCV] Question Type Guided Attention in Visual Question Answering.[paper][中文解读]
- [2018][ECCV] Question-Guided Hybrid Convolution for Visual Question Answering.[paper][中文解读]
- [2018][ECCV] Straight to the Facts Learning Knowledge Base Retrieval for Factual Visual Question Answering.[paper][中文解读]
- [2018][ECCV] Visual Question Answering as a Meta Learning Task.[paper][中文解读]
- [2018][ECCV] Visual Question Generation for Class Acquisition of Unknown Objects.[paper][中文解读]
- [2018][ECCV] VQA-E Explaining, Elaborating, and Enhancing Your Answers for Visual Questions.[paper][中文解读]
- [2018][ICLR] Compositional Attention Networks for Machine Reasoning.[paper][中文解读]
- [2018][ICLR] INTERPRETABLE COUNTING FOR VISUAL QUESTION ANSWERING.[paper][中文解读]
- [2018][ICLR] LEARNING TO COUNT OBJECTS IN NATURAL IMAGES FOR VISUAL QUESTION ANSWERING.[paper][中文解读]
- [2018][IJCAI] A Question Type Driven Framework to Diversify Visual Question Generation.[paper][中文解读]
- [2018][IJCAI] Feature Enhancement in Attention for Visual Question Answering.[paper][中文解读]
- [2018][IJCAI] From Pixels to Objects Cubic Visual Attention for Visual Question Answering.[paper][中文解读]
- [2018][NIPS] Answerer in Questioner’s Mind Information Theoretic Approach to Goal-Oriented Visual Dialog.[paper][中文解读]
- [2018][NIPS] Bilinear Attention Networks.[paper][中文解读]
- [2018][NIPS] Chain of Reasoning for Visual Question Answering.[paper][中文解读]
- [2018][NIPS] Dialog-to-Action Conversational Question Answering Over a Large-Scale Knowledge Base.[paper][中文解读]
- [2018][NIPS] Learning Conditioned Graph Structures for Interpretable Visual Question Answering.[paper][中文解读]
- [2018][NIPS] Learning to Specialize with Knowledge Distillation for Visual Question Answering.[paper][中文解读]
- [2018][NIPS] Neural-Symbolic VQA Disentangling Reasoning from Vision and Language Understanding.[paper][中文解读]
- [2018][NIPS] Out of the Box Reasoning with Graph Convolution Nets for Factual Visual Question Answering.[paper][中文解读]
- [2018][NIPS] Overcoming Language Priors in Visual Question Answering with Adversarial Regularization.[paper][中文解读]
- [2017][CVPR] An Empirical Evaluation of Visual Question Answering for Novel Objects.[paper][中文解读]
- [2017][CVPR] Are You Smarter Than A Sixth Grader Textbook Question Answering for Multimodal Machine Comprehension.[paper][中文解读]
- [2017][CVPR] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.[paper][中文解读]
- [2017][CVPR] Creativity Generating Diverse Questions using Variational Autoencoders.[paper][中文解读]
- [2017][CVPR] Graph-Structured Representations for Visual Question Answering.[paper][中文解读]
- [2017][CVPR] Knowledge Acquisition for Visual Question Answering via Iterative Querying.[paper][中文解读]
- [2017][CVPR] Making the V in VQA Matter Elevating the Role of Image Understanding in Visual Question Answering.[paper][中文解读]
- [2017][CVPR] Mining Object Parts from CNNs via Active Question-Answering.[paper][中文解读]
- [2017][CVPR] Multi-level Attention Networks for Visual Question Answering.[paper]
- [2017][CVPR] The VQA-Machine Learning How to Use Existing Vision Algorithms to Answer New Questions.[paper]
- [2017][CVPR] What’s in a Question Using Visual Questions as a Form of Supervision.[paper]
- [2017][ICCV] An Analysis of Visual Question Answering Algorithms.[paper]
- [2017][ICCV] Inferring and Executing Programs for Visual Reasoning.[paper]
- [2017][ICCV] Learning to Disambiguate by Asking Discriminative Questions.[paper]
- [2017][ICCV] Learning to Reason End-to-End Module Networks for Visual Question Answering.[paper]
- [2017][ICCV] Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering.[paper]
- [2017][ICCV] MUTAN Multimodal Tucker Fusion for Visual Question Answering.[paper]
- [2017][ICCV] Structured Attentions for Visual Question Answering.[paper]
- [2017][ICCV] VQS Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation.[paper]
- [2017][IJCAI] Automatic Generation of Grounded Visual Questions.[paper]
- [2017][IJCAI] Explicit Knowledge-based Reasoning for Visual Question Answering.[paper]
- [2017][INLG] Data Augmentation for Visual Question Answering.[paper]
- [2017][NIPS] High-Order Attention Models for Visual Question Answering.[paper]
- [2017][NIPS] Multimodal Learning and Reasoning for Visual Question Answering.[paper]
- [2017][NIPS] Question Asking as Program Generation.[paper]
- [2016][AAAI] Learning to answer questions from image using convolutional neural network.[paper][中文解读]
- [2016][CVPR] Answer-Type Prediction for Visual Question Answering.[paper][中文解读]
- [2016][CVPR] Ask Me Anything Free-Form Visual Question Answering Based on Knowledge From External Sources.[paper][中文解读]
- [2016][CVPR] Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction.[paper][中文解读]
- [2016][CVPR] Neural Module Networks.[paper][中文解读]
- [2016][CVPR] Stacked Attention Networks for Image Question Answering.[paper][中文解读]
- [2016][CVPR] Visual7W: Grounded Question Answering in Images.[paper][中文解读]
- [2016][CVPR] Where to Look Focus Regions for Visual Question Answering.[paper][中文解读]
- [2016][CVPR] Yin and Yang Balancing and Answering Binary Visual Questions.[paper][中文解读]
- [2016][ECCV] Ask, Attend and Answer Exploring Question-Guided Spatial Attention for Visual Question Answering.[paper][中文解读]
- [2016][ECCV] Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering.[paper][中文解读]
- [2016][ECCV] Leveraging Visual Question Answering for Image-Caption Ranking.[paper][中文解读]
- [2016][ECCV] Revisiting Visual Question Answering Baselines.[paper][中文解读]
- [2016][ICML] Dynamic Memory Networks for Visual and Textual Question Answering.[paper][中文解读]
- [2016][NIPS] Hierarchical Question-Image Co-Attention for Visual Question Answering.[paper][中文解读]
- [2016][NIPS] Multimodal Residual Learning for Visual QA.[paper][中文解读]
- [2015][CVPR] VisKE, Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases.[paper][中文解读]
- [2015][ICCV] Ask Your Neurons A Neural-Based Approach to Answering Questions About Images.[paper][中文解读]
- [2015][ICCV] Visual Madlibs Fill in the Blank Description Generation and Question Answering.[paper][中文解读]
- [2015][ICCV] VQA, Visual Question Answering.[paper][中文解读]
- [2015][NIPS] Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering.[paper][中文解读]
- [2015][NIPS] Exploring Models and Data for Image Question Answering.[paper][中文解读]
- [2014][NIPS] A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input.[paper][中文解读]
- [2021][CVPR] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events.[paper]
- [2021][CVPR] Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering.[paper]
- [2021][CVPR] NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions.[paper]
- [2021][ICCV] On the hidden treasure of dialog in video question answering.[paper]
- [2021][ICCV] Just Ask: Learning to Answer Questions from Millions of Narrated Videos.[paper]
- [2021][ICCV] Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments.[paper]
- [2021][ICCV] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering.[paper]
- [2021][ICCV] Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos.[paper]
- [2021][ICCV] Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature.[paper]
- [2021][CVPR] .[paper]
- [2020][CVPR] Hierarchical Conditional Relation Networks for Video Question Answering.[paper]
- [2020][CVPR] Modality Shifting Attention Network for Multi-Modal Video Question Answering.[paper]
- [2020][ECCV][poster] Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions.[paper]
- [2020][TIP] Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.[paper]
- [2020][WACV] BERT Representations for Video Question Answering.[paper]
- [2019][AAAI] Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering.[paper]
- [2019][AAAI] Structured Two-stream Attention Network for Video Question Answering.[paper]
- [2019][ACMMM] Learnable Aggregating Net with Divergent Loss for VideoQA.[paper]
- [2019][ACMMM] Multi-interaction Network with Object Relation for VideoQA.[paper]
- [2019][ACMMM] Question-Aware Tube-Switch Network for VideoQA.[paper]
- [2019][CVPR] Heterogeneous Memory Enhanced Multimodal Attention Model for VideoQA.[paper]
- [2019][CVPR] Progressive Attention Memory Network for Movie Story Question Answering.[paper]
- [2019][ICCV] SegEQA Video Segmentation based Visual Attention for Embodied Question Answering.[paper]
- [2019][IJCAI] Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks.[paper]
- [2019][IJCNN] Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering.[paper]
- [2019][TIP] Compositional Attention Networks With Two-Stream Fusion for Video Question Answering.[paper]
- [2019][TIP] Holistic Multi-modal Memory Network for Movie Question Answering.[paper]
- [2018][ACMMM] Explore Multi-Step Reasoning in Video Question Answering.[paper]
- [2018][CVPR] Motion-Appearance Co-Memory Networks for Video Question Answering.[paper]
- [2018][ECCV] A Joint Sequence Fusion Model for Video Question Answering and Retrieval.[paper]
- [2018][ECCV] Multimodal Dual Attention Memory for Video Story Question Answering.[paper]
- [2018][EMNLP] TVQA, Localized, Compositional Video Question Answering.[paper]
- [2017][AAAI] Leveraging Video Descriptions to Learn Video Question Answering.[paper]
- [2017][ACMMM] VideoQA via Gradually Refined Attention over Appearance and Motion.[paper]
- [2017][ACMMM] VideoQA via Hierarchical Dual-Level Attention Network Learning.[paper]
- [2017][CVPR] A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering.[paper]
- [2017][CVPR] End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering.[paper]
- [2017][CVPR] TGIF-QA Toward Spatio-Temporal Reasoning in Visual Question Answering.[paper]
- [2017][ICCV] MarioQA Answering Questions by Watching Gameplay Videos.[paper]
- [2017][ICCV] Video Fill In the Blank using LRRL LSTMs with Spatial-Temporal Attentions.[paper]
- [2017][IJCAI] Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.[paper]
- [2017][SIGIR] Video Question Answering via Attributed-Augmented Attention Network Learning.[paper]
- [2016][CVPR] MovieQA, Understanding Stories in Movies through Question-Answering.[paper]
- [2015][arXiv] Uncovering the temporal context for video question and answering.[paper]
- [2014][ACMMM] Joint video and text parsing for understanding events and answering queries.[paper]