tenaflyyy/Papers

Motion-Appearance Co-Memory Networks for Video Question Answering

tenaflyyy opened this issue 7 years ago · 0 comments

tenaflyyy commented 7 years ago

Abstract

Three unique attributes of videoQA compared with image QA
- deals with long sequences of images.
- motion and appearance information are usually correlated with each other and provide useful attention cues.
- different questions require different number of frames to infer the answer.
Proposed Method
- Build on concepts from Dynamic Memory Network and introduces new mechanisms.
- Three salient aspects:
  - utilizes cues from both motion and appearance to generate attention.
  - a temporal conv-deconv network to generate multi-level contextual facts.
  - a dynamic fact ensemble method to construct temporal representation dynamically for different questions.
Datasets
- TGIF-QA dataset.
- the results outperform state-of-the-art signiﬁcantly on all four tasks of TGIF-QA.

Details

Introduction
- The model is built on concepts of DMN/DMN+, share the same terms with DMN , such as facts, memory and attention
- a video is converted to a sequence of motion and appearance features by the two-stream models [arxiv:1608.00797] . The motion and appearance features are then fed into a temporal convolutional and deconvolutional neural network to build multi-level contextual facts.
- These contextual facts are used as input facts to the memory networks.
- The co-memory networks hold two separate memory states, one for motion and one for appearance.
- a co-memory attention mechanism takes motion cues for appearance attention generation, and appearance cues for motion attention generation.
- dynamic fact ensemble method to produce temporal facts dynamically at each cycle of fact encoding.
Contributions
- General Dynamic Memory Networks:
  - Fact module
  - Question module
  - Episodic memory network
  - Answer module
- Motion-Appearance Co-Memory Networks
  - multi-level contextual facts
  - co-memory module
  - answer module
  - question module [remains the same as the one in traditional DMN]
Experiments

Personal Thoughts