/advanced-XAI-for-DeepLearning

Here I gather promising research directions to make DNNs interpretable

This hub contains ADVANCED and PROMISING directions in XAI for Deep Neural Networks

There has been a lot of approaches to achieve interpretability in DL; however, there are only few research directions that are promising in 2021. Therefore, I only consider NEW and likely PROMISING directions here.

Categories

Below is the layout of this hub:

  • Network conceptualization

  • Prototype-based explanations

  • Inherently-interpretable DNNs

  • Evaluation of explanations on down-stream tasks

  • Interpreting Large Foundation Models (LLMs)

  • Interactive XAI

Other categorizations are reasonable as well (i.e. from Anh Nguyen, Molnar, or lopusz). However, I'd like to curate my own layout.

I also like this distinction(1:45) between Explainable ML and Interpretable ML by Rudin Cynthia.

Network conceptualization

This line of research assigns human concepts to learned concepts of DNNs, which can make explanations more human-friendly and specific. Here I just picked a few representative papers, please contribute if any.

  • Concept Bottleneck Models https://proceedings.mlr.press/v119/koh20a.html

  • Codebook Features: Sparse and Discrete Interpretability for Neural Networks https://arxiv.org/pdf/2310.17230.pdf

  • Backpack Language Models https://arxiv.org/abs/2305.16765

  • Network Dissection: Quantifying Interpretability of Deep Visual Representations (CVPR2017) - review

  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) (ICML2018) - review

  • Towards Automatic Concept-based Explanations (NeurIPS2019) - review

  • MILAN - Natural Language Descriptions of Deep Visual Features (ICLR2022) - paper

  • LAVISE - Explaining Deep Convolutional Neural Networks via Unsupervised Visual-Semantic Filter Attention (CVPR2022) - paper

  • DISSECT: Disentangled Simultaneous Explanations via Concept Traversals (ICLR2022) - paper

The following works try to mine learned concepts from pretrained models:

  • Craft: Concept recursive activation factorization for explainability

  • A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation

  • COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks

Prototype-based explanations

This line of research explains DNNs' decisions using the prototypes (or examples). Hence, it is inherently difficult to evaluate approaches quantitatively.

  • This Looks Like That: Deep Learning for Interpretable Image Recognition (NIPS2019) - review
  • This Looks Like It Rather Than That: ProtoKNN For Similarity-Based Classifiers https://openreview.net/forum?id=lh-HRYxuoRr
  • Neural Prototype Trees for Interpretable Fine-grained Image Recognition https://arxiv.org/abs/2012.02046
  • Explaining Latent Representations with a Corpus of Examples (NeurIPS2021)
  • A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification (Trans. Mach. Learn. Res. 2022)
  • Visual correspondence-based explanations improve AI robustness and human-AI team accuracy https://arxiv.org/abs/2208.00780
  • AdvisingNets: Learning to Distinguish Correct and Wrong Classifications via Nearest-Neighbor Explanations https://arxiv.org/pdf/2308.13651.pdf

Inherently-interpretable DNNs

This line of research turns existing black-box DNNs (e.g. VGG or ResNet) into white-box models by altering and forcing them to behave in a human understandable manner.

  • SEEING IS BELIEVING: BRAIN-INSPIRED MODULAR TRAINING FOR MECHANISTIC INTERPRETABILITY

  • Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead (Nature Machine Intelligence) - review

  • Concept Whitening for Interpretable Image Recognition (Nature Machine Intelligence) - review

  • Exploring the cloud of variable importance for the set of all good models (Nature Machine Intelligence) - review

  • This Looks Like That: Deep Learning for Interpretable Image Recognition (NIPS2019) - review

  • Visual correspondence-based explanations improve AI robustness and human-AI team accuracy (NeurIPS2022) - paper

  • B-cos Networks: Alignment is All We Need for Interpretability (CVPR2022) - paper

  • Neural Additive Models: Interpretable Machine Learning with Neural Nets (NeurIPS2021) - paper

Evaluation of explanations on down-stream tasks

As humans being the target end-users of explanations, this line of research investigates the actual effectiveness of explanations to humans in various decision-making tasks.

Interpreting Large Foundation Models

Interactive XAI (added on Mar 16 2024)

  • Explaining decision-making algorithms through UI: Strategies to help non-expert stakeholders
  • An Interactive UI to Support Sensemaking over Collections of Parallel Texts
  • Rethinking Explainability as a Dialogue: A Practitioner's Perspective
  • May I Ask a Follow-up Question? Understanding the Benefits of Conversations in Neural Network Explainability
  • Explaining machine learning models with interactive natural language conversations using TalkToModel
  • Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy