tenaflyyy/Papers

Motion-Appearance Co-Memory Networks for Video Question Answering

tenaflyyy opened this issue · 0 comments

Abstract

  • Three unique attributes of videoQA compared with image QA
    • deals with long sequences of images.
    • motion and appearance information are usually correlated with each other and provide useful attention cues.
    • different questions require different number of frames to infer the answer.
  • Proposed Method
    • Build on concepts from Dynamic Memory Network and introduces new mechanisms.
    • Three salient aspects:
      • utilizes cues from both motion and appearance to generate attention.
      • a temporal conv-deconv network to generate multi-level contextual facts.
      • a dynamic fact ensemble method to construct temporal representation dynamically for different questions.
  • Datasets
    • TGIF-QA dataset.
    • the results outperform state-of-the-art significantly on all four tasks of TGIF-QA.

Details

  • Introduction

    • The model is built on concepts of DMN/DMN+, share the same terms with DMN , such as facts, memory and attention
    • a video is converted to a sequence of motion and appearance features by the two-stream models [arxiv:1608.00797] . The motion and appearance features are then fed into a temporal convolutional and deconvolutional neural network to build multi-level contextual facts.
    • These contextual facts are used as input facts to the memory networks.
    • The co-memory networks hold two separate memory states, one for motion and one for appearance.
    • a co-memory attention mechanism takes motion cues for appearance attention generation, and appearance cues for motion attention generation.
    • dynamic fact ensemble method to produce temporal facts dynamically at each cycle of fact encoding.
  • Contributions

    • General Dynamic Memory Networks:
      • Fact module
      • Question module
      • Episodic memory network
      • Answer module
        qq 20180409131642
    • Motion-Appearance Co-Memory Networks
      • multi-level contextual facts
      • co-memory module
      • answer module
      • question module [remains the same as the one in traditional DMN]
        qq 20180409140603
  • Experiments

Personal Thoughts