AIX_BDML

Methods for interpreting and understanding deep neural networks, 2018

Interpretation

  • Activation Maximize
  • AM + expert
  • AM in code space

Explanation

pooling

  • sensitivity
  • simple taylor decomposition

filtering

  • deconvolution
  • extension guided back prop

pooling & filtering

  • LRP

Explaining the black-box model: A survey of local interpretation methods for deep neural networks, 2021.01

data-driven

perturebation-based

adversarial-based

concept-based

model-driven

gradient-based

corrlation-score

class activation map

On Interpretability of Artificial Neural Networks: A Survey, 2021.05

Post-Hoc

Feature Analysis

Model Inspection

Saliency

Proxy

Advanced Mathematical/Physical Analysis

Explaining-by-Case

  • K-Nearest Neighbor Algorithm
  • Counter factual case

Explaining-by-Text

  • Neural image captioning
    • CNN + bidirection-RNN
    • CNN + attention-RNN

Ad-Hoc

Interpretable Representation

Model Renovation

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond, 2021.05

image

types of models

  • Model-agnostic
  • Differentiable model
  • Specific model

representations of interpretation

  • Feautre(Importance)
  • Model Response
  • Model Rationale Process
  • Dataset

image

relation between the interpretation algorithm and the model

  • Closed-form
  • Composition
  • Dependence
  • Proxy