Interpretable Visual Reasoning via Induced Symbolic Space

This is the repo to host the code for OCCAM (Object-Centric Compositional Attention Model) in the following paper:

Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson and Humphrey Shi, Interpretable Visual Reasoning via Induced Symbolic Space, Arxiv link.

Note: Our code will be released soon, stay tuned.

Introduction

Our proposed OCCAM framework performs pure object-level reasoning and achieves a new state-of-the-art without human-annotated functional programs on the CLEVR dataset. Our framework makes the object-word cooccurrence information avaiable, which enables induction of the concepts and super concepts based on the inclusiveness and the mutual exclusiveness of words’ visual mappings. When working on concepts instead of visual features, OCCAM achieves comparable performance, proving the accuracy and sufﬁciency of the induced concepts.

Results

In this table, we report the comparison of our object-level compositional reasoning framework to the state-of-the-art methods. * indicates the method uses external program annotations.

method	overall	count	exist	comp numb	query attr	comp attr
Human	92.6	86.7	96.6	86.5	95.0	96.0

NMN*	72.1	52.5	72.7	79.3	79.0	78.0
N2NMN*	83.7	68.5	85.7	84.9	90.0	88.7
IEP*	96.9	92.7	97.1	98.7	98.1	98.9
TbD*	99.1	97.6	99.4	99.2	99.5	99.6
NS-VQA*	99.8	99.7	99.9	99.9	99.8	99.8

RN	95.5	90.1	93.6	97.8	97.1	97.9
FiLM	97.6	94.5	93.8	99.2	99.2	99.0
MAC	98.9	97.2	99.4	99.5	99.3	99.5
NS-CL	98.9	98.2	99.0	98.8	99.3	99.1
OCCAM (ours)	99.4	98.1	99.8	99.0	99.9	99.9

Bibtex

@article{wang2020interpretable,
  title={Interpretable Visual Reasoning via Induced Symbolic Space},
  author={Wang, Zhonghao and Yu, Mo and Wang, Kai and Xiong, Jinjun and Hwu, Wen-mei and Hasegawa-Johnson, Mark and Shi, Humphrey},
  journal={arXiv preprint arXiv:2011.11603},
  year={2020}
}

SHI-Labs/Interpretable-Visual-Reasoning

Interpretable Visual Reasoning via Induced Symbolic Space

Introduction

Results

Bibtex