/explainable_ai_literature

A repository for summaries of recent explainable AI/Interpretable ML approaches

Recent Publications in Explainable AI

A repository recent explainable AI/Interpretable ML approaches

2015

Title Venue Year Code Keywords Summary
Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission KDD 2015 N/A ``
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model arXiv 2015 N/A ``

2016

Title Venue Year Code Keywords Summary
Interpretable Decision Sets: A Joint Framework for Description and Prediction KDD 2016 N/A ``
"Why Should I Trust You?": Explaining the Predictions of Any Classifier KDD 2016 N/A ``
Towards A Rigorous Science of Interpretable Machine Learning arXiv 2017 N/A Review Paper

2017

Title Venue Year Code Keywords Summary
Transparency: Motivations and Challenges arXiv 2017 N/A Review Paper
A Unified Approach to Interpreting Model Predictions NeurIPS 2017 N/A ``
SmoothGrad: removing noise by adding noise ICML (Workshop) 2017 Github ``
Axiomatic Attribution for Deep Networks ICML 2017 N/A ``
Learning Important Features Through Propagating Activation Differences ICML 2017 N/A ``
Understanding Black-box Predictions via Influence Functions ICML 2017 N/A ``
Network Dissection: Quantifying Interpretability of Deep Visual Representations CVPR 2017 N/A ``

2018

Title Venue Year Code Keywords Summary
Explainable Prediction of Medical Codes from Clinical Text ACL 2018 N/A ``
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) ICML 2018 N/A ``
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR HJTL 2018 N/A ``
Sanity Checks for Saliency Maps NeruIPS 2018 N/A ``
Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions AAAI 2018 N/A ``
The Mythos of Model Interpretability arXiv 2018 N/A Review Paper

2019

Title Venue Year Code Keywords Summary
Human Evaluation of Models Built for Interpretability AAAI 2019 N/A Human in the loop
Data Shapley: Equitable Valuation of Data for Machine Learning ICML 2019 N/A ``
Attention is not Explanation ACL 2019 N/A ``
Actionable Recourse in Linear Classification FAccT 2019 N/A ``
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead Nature 2019 N/A ``
Explanations can be manipulated and geometry is to blame NeurIPS 2019 N/A ``
Learning Optimized Risk Scores JMLR 2019 N/A ``
Explain Yourself! Leveraging Language Models for Commonsense Reasoning ACL 2019 N/A ``
Deep Neural Networks Constrained by Decision Rules AAAI 2018 N/A ``

2020

Title Venue Year Code Keywords Summary
Interpreting the Latent Space of GANs for Semantic Face Editing CVPR 2020 N/A ``
GANSpace: Discovering Interpretable GAN Controls NeurIPS 2020 N/A ``
Explainability for fair machine learning arXiv 2020 N/A ``
An Introduction to Circuits Distill 2020 N/A Tutorial
Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses NeurIPS 2020 N/A ``
Learning Model-Agnostic Counterfactual Explanations for Tabular Data WWW 2020 N/A ``
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods AIES (AAAI) 2020 N/A ``
Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning CHI 2020 N/A Review Paper
Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs arXiv 2020 N/A Review Paper
Human-Driven FOL Explanations of Deep Learning IJCAI 2020 N\A 'Logic Explanations'
A Constraint-Based Approach to Learning and Explanation AAAI 2020 N\A 'Mutual Information'

2021

Title Venue Year Code Keywords Summary
A Learning Theoretic Perspective on Local Explainability ICLR (Poster) 2021 N/A ``
A Learning Theoretic Perspective on Local Explainability ICLR 2021 N/A ``
Do Input Gradients Highlight Discriminative Features? NeurIPS 2021 N/A ``
Explaining by Removing: A Unified Framework for Model Explanation JMLR 2021 N/A ``
Explainable Active Learning (XAL): An Empirical Study of How Local Explanations Impact Annotator Experience PACMHCI 2021 N/A ``
Towards Robust and Reliable Algorithmic Recourse NeurIPS 2021 N/A ``
Algorithmic Recourse: from Counterfactual Explanations to Interventions FAccT 2021 N/A ``
Manipulating and Measuring Model Interpretability CHI 2021 N/A ``
Explainable Reinforcement Learning via Model Transforms NeurIPS 2021 N/A ``

2022

Title Venue Year Code Keywords Summary
GlanceNets: Interpretabile, Leak-proof Concept-based Models CRL 2022 N/A ``
Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases Transformer Circuit Thread 2022 N/A Tutorial
Can language models learn from explanations in context? EMNLP 2022 N/A DeepMind
Interpreting Language Models with Contrastive Explanations EMNLP 2022 N/A ``
Acquisition of Chess Knowledge in AlphaZero PNAS 2022 N/A DeepMind GoogleBrain
What the DAAM: Interpreting Stable Diffusion Using Cross Attention arXiv 2022 Github ``
Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis AISTATS 2022 N/A ``
Use-Case-Grounded Simulations for Explanation Evaluation NIPS 2022 N/A ``
The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective arXiv 2022 N/A ``
What Makes a Good Explanation?: A Harmonized View of Properties of Explanations arXiv 2022 N/A ``
NoiseGrad — Enhancing Explanations by Introducing Stochasticity to Model Weights AAAI 2022 Github ``
Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations AIES (AAAI) 2022 N/A ``
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models arXiv 2022 Github ``
Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off NuerIPS 2022 Github CBM, CEM
Self-explaining deep models with logic rule reasoning NeurIPS 2022 N/A ``
What You See is What You Classify: Black Box Attributions NeurIPS 2022 N/A ``
Concept Activation Regions: A Generalized Framework For Concept-Based Explanations NeurIPS 2022 N/A ``
What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods NeurIPS 2022 N/A ``
Scalable Interpretability via Polynomials NeurIPS 2022 N/A ``
Learning to Scaffold: Optimizing Model Explanations for Teaching NeurIPS 2022 N/A ``
Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF NeurIPS 2022 N/A ``
WeightedSHAP: analyzing and improving Shapley based feature attribution NeurIPS 2022 N/A ``
Visual correspondence-based explanations improve AI robustness and human-AI team accuracy NeurIPS 2022 N/A ``
VICE: Variational Interpretable Concept Embeddings NeurIPS 2022 N/A ``
Robust Feature-Level Adversaries are Interpretability Tools NeurIPS 2022 N/A ``
ProtoX: Explaining a Reinforcement Learning Agent via Prototyping NeurIPS 2022 N/A ``
ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model NeurIPS 2022 N/A ``
Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability NeurIPS 2022 N/A ``
Neural Basis Models for Interpretability NeurIPS 2022 N/A ``
Implications of Model Indeterminacy for Explanations of Automated Decisions NeurIPS 2022 N/A ``
Explainability Via Causal Self-Talk NeurIPS 2022 N/A DeepMind
TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations NeurIPS 2022 N/A ``
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models NeurIPS 2022 N/A GoogleBrain
OpenXAI: Towards a Transparent Evaluation of Model Explanations NeurIPS 2022 N/A ``
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations NeurIPS 2022 N/A ``
Interpreting Language Models with Contrastive Explanations EMNLP 2022 N/A ``
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models EMNLP 2022 N/A ``
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations EMNLP 2022 N/A ``
MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure EMNLP 2022 N/A ``
Towards Interactivity and Interpretability: A Rationale-based Legal Judgment Prediction Framework EMNLP 2022 N/A ``
Explainable Question Answering based on Semantic Graph by Global Differentiable Learning and Dynamic Adaptive Reasoning EMNLP 2022 N/A ``
Faithful Knowledge Graph Explanations in Commonsense Question Answering EMNLP 2022 N/A ``
Optimal Interpretable Clustering Using Oblique Decision Trees KDD 2022 N/A ``
ExMeshCNN: An Explainable Convolutional Neural Network Architecture for 3D Shape Analysis KDD 2022 N/A ``
Learning Differential Operators for Interpretable Time Series Modeling KDD 2022 N/A ``
Compute Like Humans: Interpretable Step-by-step Symbolic Computation with Deep Neural Network KDD 2022 N/A ``
Causal Attention for Interpretable and Generalizable Graph Classification KDD 2022 N/A ``
Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction KDD 2022 N/A ``
Label-Free Explainability for Unsupervised Models ICML 2022 N/A ``
Rethinking Attention-Model Explainability through Faithfulness Violation Test ICML 2022 N/A ``
Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods ICML 2022 N/A ``
A Functional Information Perspective on Model Interpretation ICML 2022 N/A ``
Inducing Causal Structure for Interpretable Neural Networks ICML 2022 N/A ``
ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder ICML 2022 N/A ``
Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings ICML 2022 N/A ``
Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism ICML 2022 N/A ``
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers ICML 2022 N/A ``
Robust Models Are More Interpretable Because Attributions Look Normal ICML 2022 N/A ``
Latent Diffusion Energy-Based Model for Interpretable Text Modelling ICML 2022 N/A ``

2023

Title Venue Year Code Keywords Summary
On the Privacy Risks of Algorithmic Recourse AISTATS 2023 N/A ``
Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten ICML 2023 N/A ``
Tracr: Compiled Transformers as a Laboratory for Interpretability arXiv 2023 Github DeepMind
Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse ICLR 2023 N/A ``
Concept-level Debugging of Part-Prototype Networks ICLR 2023 N/A ``
Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning ICLR 2023 N/A ``
Re-calibrating Feature Attributions for Model Interpretation ICLR 2023 N/A ``
Post-hoc Concept Bottleneck Models ICLR 2023 N/A ``
Quantifying Memorization Across Neural Language Models ICLR 2023 N/A ``
STREET: A Multi-Task Structured Reasoning and Explanation Benchmark ICLR 2023 N/A ``
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification CVPR 2023 N/A ``
EVAL: Explainable Video Anomaly Localization CVPR 2023 N/A ``
Overlooked Factors in Concept-based Explanations: Dataset Choice, Concept Learnability, and Human Capability CVPR 2023 Github ``
Spatial-Temporal Concept Based Explanation of 3D ConvNets CVPR 2023 Github ``
Adversarial Counterfactual Visual Explanations CVPR 2023 N/A ``
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification CVPR 2023 N/A ``
Explaining Image Classifiers With Multiscale Directional Image Representation CVPR 2023 N/A ``
CRAFT: Concept Recursive Activation FacTorization for Explainability CVPR 2023 N/A ``
SketchXAI: A First Look at Explainability for Human Sketches CVPR 2023 N/A ``
Don't Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis CVPR 2023 N/A ``
Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning CVPR 2023 N/A ``
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification CVPR 2023 N/A ``
Interpretable Neural-Symbolic Concept Reasoning ICML 2023 Github
Identifying Interpretable Subspaces in Image Representations ICML 2023 N/A ``
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat ICML 2023 N/A ``
Explainability as statistical inference ICML 2023 N/A ``
On the Impact of Knowledge Distillation for Model Interpretability ICML 2023 N/A ``
NA2Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning ICML 2023 N/A ``
Explaining Reinforcement Learning with Shapley Values ICML 2023 N/A ``
Explainable Data-Driven Optimization: From Context to Decision and Back Again ICML 2023 N/A ``
Causal Proxy Models for Concept-based Model Explanations ICML 2023 N/A ``
Learning Perturbations to Explain Time Series Predictions ICML 2023 N/A ``
Rethinking Explaining Graph Neural Networks via Non-parametric Subgraph Matching ICML 2023 N/A ``
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat ICML 2023 Github ``
Representer Point Selection for Explaining Regularized High-dimensional Models ICML 2023 N/A ``
Towards Explaining Distribution Shifts ICML 2023 N/A ``
Relevant Walk Search for Explaining Graph Neural Networks ICML 2023 Github ``
Concept-based Explanations for Out-of-Distribution Detectors ICML 2023 N/A ``
GLOBE-CE: A Translation Based Approach for Global Counterfactual Explanations ICML 2023 N/A ``
Robust Explanation for Free or At the Cost of Faithfulness ICML 2023 N/A ``
Learn to Accumulate Evidence from All Training Samples: Theory and Practice ICML 2023 N/A ``
Towards Trustworthy Explanation: On Causal Rationalization ICML 2023 N/A ``
Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables ICML 2023 N/A ``
Probabilistic Concept Bottleneck Models ICML 2023 N/A ``
What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective ICML 2023 N/A ``
Towards credible visual model interpretation with path attribution ICML 2023 N/A ``
Trainability, Expressivity and Interpretability in Gated Neural ODEs ICML 2023 N/A ``
Discover and Cure: Concept-aware Mitigation of Spurious Correlation ICML 2023 N/A ``
PWSHAP: A Path-Wise Explanation Model for Targeted Variables ICML 2023 N/A ``
A Closer Look at the Intervention Procedure of Concept Bottleneck Models ICML 2023 N/A ``
Rethinking Interpretation: Input-Agnostic Saliency Mapping of Deep Visual Classifiers AAAI 2023 N/A ``
TopicFM: Robust and Interpretable Topic-Assisted Feature Matching AAAI 2023 N/A ``
Solving Explainability Queries with Quantification: The Case of Feature Relevancy AAAI 2023 N/A ``
PEN: Prediction-Explanation Network to Forecast Stock Price Movement with Better Explainability AAAI 2023 N/A ``
KerPrint: Local-Global Knowledge Graph Enhanced Diagnosis Prediction for Retrospective and Prospective Interpretations AAAI 2023 N/A ``
Beyond Graph Convolutional Network: An Interpretable Regularizer-Centered Optimization Framework AAAI 2023 N/A ``
Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling AAAI 2023 N/A ``
Learning Interpretable Temporal Properties from Positive Examples Only AAAI 2023 N/A ``
Symbolic Metamodels for Interpreting Black-Boxes Using Primitive Functions AAAI 2023 N/A ``
Towards More Robust Interpretation via Local Gradient Alignment AAAI 2023 N/A ``
Towards Fine-Grained Explainability for Heterogeneous Graph Neural Network AAAI 2023 N/A ``
XClusters: Explainability-First Clustering AAAI 2023 N/A ``
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis AAAI 2023 N/A ``
Fairness and Explainability: Bridging the Gap towards Fair Model Explanations AAAI 2023 N/A ``
Explaining Model Confidence Using Counterfactuals AAAI 2023 N/A ``
SEAT: Stable and Explainable Attention AAAI 2023 N/A ``
Factual and Informative Review Generation for Explainable Recommendation AAAI 2023 N/A ``
Improving Interpretability via Explicit Word Interaction Graph Layer AAAI 2023 N/A ``
Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing AAAI 2023 N/A ``
Improving Interpretability of Deep Sequential Knowledge Tracing Models with Question-centric Cognitive Representations AAAI 2023 N/A ``
Targeted Knowledge Infusion To Make Conversational AI Explainable and Safe AAAI 2023 N/A ``
eForecaster: Unifying Electricity Forecasting with Robust, Flexible, and Explainable Machine Learning Algorithms AAAI 2023 N/A ``
SolderNet: Towards Trustworthy Visual Inspection of Solder Joints in Electronics Manufacturing Using Explainable Artificial Intelligence AAAI 2023 N/A ``
Xaitk-Saliency: An Open Source Explainable AI Toolkit for Saliency AAAI 2023 N/A ``
Ripple: Concept-Based Interpretation for Raw Time Series Models in Education AAAI 2023 N/A ``
Semantics, Ontology and Explanation arXiv 2023 N/A Ontological Unpacking
N/A ``