Recent Publications in Explainable AI

A repository recent explainable AI/Interpretable ML approaches

2015

Title	Venue	Year	Code	Keywords	Summary
Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission	KDD	2015	N/A	``
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model	arXiv	2015	N/A	``

2016

Title	Venue	Year	Code	Keywords
Interpretable Decision Sets: A Joint Framework for Description and Prediction	KDD	2016	N/A	``
"Why Should I Trust You?": Explaining the Predictions of Any Classifier	KDD	2016	N/A	``
Towards A Rigorous Science of Interpretable Machine Learning	arXiv	2017	N/A	`Review Paper`

2017

Title	Venue	Year	Code	Keywords
Transparency: Motivations and Challenges	arXiv	2017	N/A	`Review Paper`
A Unified Approach to Interpreting Model Predictions	NeurIPS	2017	N/A	``
SmoothGrad: removing noise by adding noise	ICML (Workshop)	2017	Github	``
Axiomatic Attribution for Deep Networks	ICML	2017	N/A	``
Learning Important Features Through Propagating Activation Differences	ICML	2017	N/A	``
Understanding Black-box Predictions via Influence Functions	ICML	2017	N/A	``
Network Dissection: Quantifying Interpretability of Deep Visual Representations	CVPR	2017	N/A	``

2018

Title	Venue	Year	Code	Keywords
Explainable Prediction of Medical Codes from Clinical Text	ACL	2018	N/A	``
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)	ICML	2018	N/A	``
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR	HJTL	2018	N/A	``
Sanity Checks for Saliency Maps	NeruIPS	2018	N/A	``
Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions	AAAI	2018	N/A	``
The Mythos of Model Interpretability	arXiv	2018	N/A	`Review Paper`

2019

Title	Venue	Year	Code	Keywords
Human Evaluation of Models Built for Interpretability	AAAI	2019	N/A	`Human in the loop`
Data Shapley: Equitable Valuation of Data for Machine Learning	ICML	2019	N/A	``
Attention is not Explanation	ACL	2019	N/A	``
Actionable Recourse in Linear Classification	FAccT	2019	N/A	``
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead	Nature	2019	N/A	``
Explanations can be manipulated and geometry is to blame	NeurIPS	2019	N/A	``
Learning Optimized Risk Scores	JMLR	2019	N/A	``
Explain Yourself! Leveraging Language Models for Commonsense Reasoning	ACL	2019	N/A	``
Deep Neural Networks Constrained by Decision Rules	AAAI	2018	N/A	``

2020

Title	Venue	Year	Code	Keywords
Interpreting the Latent Space of GANs for Semantic Face Editing	CVPR	2020	N/A	``
GANSpace: Discovering Interpretable GAN Controls	NeurIPS	2020	N/A	``
Explainability for fair machine learning	arXiv	2020	N/A	``
An Introduction to Circuits	Distill	2020	N/A	`Tutorial`
Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses	NeurIPS	2020	N/A	``
Learning Model-Agnostic Counterfactual Explanations for Tabular Data	WWW	2020	N/A	``
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods	AIES (AAAI)	2020	N/A	``
Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning	CHI	2020	N/A	`Review Paper`
Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs	arXiv	2020	N/A	`Review Paper`
Human-Driven FOL Explanations of Deep Learning	IJCAI	2020	N\A	'Logic Explanations'
A Constraint-Based Approach to Learning and Explanation	AAAI	2020	N\A	'Mutual Information'

2021

Title	Venue	Year	Code	Keywords
A Learning Theoretic Perspective on Local Explainability	ICLR (Poster)	2021	N/A	``
A Learning Theoretic Perspective on Local Explainability	ICLR	2021	N/A	``
Do Input Gradients Highlight Discriminative Features?	NeurIPS	2021	N/A	``
Explaining by Removing: A Unified Framework for Model Explanation	JMLR	2021	N/A	``
Explainable Active Learning (XAL): An Empirical Study of How Local Explanations Impact Annotator Experience	PACMHCI	2021	N/A	``
Towards Robust and Reliable Algorithmic Recourse	NeurIPS	2021	N/A	``
Algorithmic Recourse: from Counterfactual Explanations to Interventions	FAccT	2021	N/A	``
Manipulating and Measuring Model Interpretability	CHI	2021	N/A	``
Explainable Reinforcement Learning via Model Transforms	NeurIPS	2021	N/A	``

2022

Title	Venue	Year	Code	Keywords
GlanceNets: Interpretabile, Leak-proof Concept-based Models	CRL	2022	N/A	``
Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases	Transformer Circuit Thread	2022	N/A	`Tutorial`
Can language models learn from explanations in context?	EMNLP	2022	N/A	`DeepMind`
Interpreting Language Models with Contrastive Explanations	EMNLP	2022	N/A	``
Acquisition of Chess Knowledge in AlphaZero	PNAS	2022	N/A	`DeepMind` `GoogleBrain`
What the DAAM: Interpreting Stable Diffusion Using Cross Attention	arXiv	2022	Github	``
Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis	AISTATS	2022	N/A	``
Use-Case-Grounded Simulations for Explanation Evaluation	NIPS	2022	N/A	``
The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective	arXiv	2022	N/A	``
What Makes a Good Explanation?: A Harmonized View of Properties of Explanations	arXiv	2022	N/A	``
NoiseGrad — Enhancing Explanations by Introducing Stochasticity to Model Weights	AAAI	2022	Github	``
Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations	AIES (AAAI)	2022	N/A	``
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models	arXiv	2022	Github	``
Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off	NuerIPS	2022	Github	`CBM`, `CEM`
Self-explaining deep models with logic rule reasoning	NeurIPS	2022	N/A	``
What You See is What You Classify: Black Box Attributions	NeurIPS	2022	N/A	``
Concept Activation Regions: A Generalized Framework For Concept-Based Explanations	NeurIPS	2022	N/A	``
What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods	NeurIPS	2022	N/A	``
Scalable Interpretability via Polynomials	NeurIPS	2022	N/A	``
Learning to Scaffold: Optimizing Model Explanations for Teaching	NeurIPS	2022	N/A	``
Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF	NeurIPS	2022	N/A	``
WeightedSHAP: analyzing and improving Shapley based feature attribution	NeurIPS	2022	N/A	``
Visual correspondence-based explanations improve AI robustness and human-AI team accuracy	NeurIPS	2022	N/A	``
VICE: Variational Interpretable Concept Embeddings	NeurIPS	2022	N/A	``
Robust Feature-Level Adversaries are Interpretability Tools	NeurIPS	2022	N/A	``
ProtoX: Explaining a Reinforcement Learning Agent via Prototyping	NeurIPS	2022	N/A	``
ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model	NeurIPS	2022	N/A	``
Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability	NeurIPS	2022	N/A	``
Neural Basis Models for Interpretability	NeurIPS	2022	N/A	``
Implications of Model Indeterminacy for Explanations of Automated Decisions	NeurIPS	2022	N/A	``
Explainability Via Causal Self-Talk	NeurIPS	2022	N/A	`DeepMind`
TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations	NeurIPS	2022	N/A	``
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	NeurIPS	2022	N/A	`GoogleBrain`
OpenXAI: Towards a Transparent Evaluation of Model Explanations	NeurIPS	2022	N/A	``
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations	NeurIPS	2022	N/A	``
Interpreting Language Models with Contrastive Explanations	EMNLP	2022	N/A	``
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models	EMNLP	2022	N/A	``
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations	EMNLP	2022	N/A	``
MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure	EMNLP	2022	N/A	``
Towards Interactivity and Interpretability: A Rationale-based Legal Judgment Prediction Framework	EMNLP	2022	N/A	``
Explainable Question Answering based on Semantic Graph by Global Differentiable Learning and Dynamic Adaptive Reasoning	EMNLP	2022	N/A	``
Faithful Knowledge Graph Explanations in Commonsense Question Answering	EMNLP	2022	N/A	``
Optimal Interpretable Clustering Using Oblique Decision Trees	KDD	2022	N/A	``
ExMeshCNN: An Explainable Convolutional Neural Network Architecture for 3D Shape Analysis	KDD	2022	N/A	``
Learning Differential Operators for Interpretable Time Series Modeling	KDD	2022	N/A	``
Compute Like Humans: Interpretable Step-by-step Symbolic Computation with Deep Neural Network	KDD	2022	N/A	``
Causal Attention for Interpretable and Generalizable Graph Classification	KDD	2022	N/A	``
Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction	KDD	2022	N/A	``
Label-Free Explainability for Unsupervised Models	ICML	2022	N/A	``
Rethinking Attention-Model Explainability through Faithfulness Violation Test	ICML	2022	N/A	``
Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods	ICML	2022	N/A	``
A Functional Information Perspective on Model Interpretation	ICML	2022	N/A	``
Inducing Causal Structure for Interpretable Neural Networks	ICML	2022	N/A	``
ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder	ICML	2022	N/A	``
Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings	ICML	2022	N/A	``
Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism	ICML	2022	N/A	``
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers	ICML	2022	N/A	``
Robust Models Are More Interpretable Because Attributions Look Normal	ICML	2022	N/A	``
Latent Diffusion Energy-Based Model for Interpretable Text Modelling	ICML	2022	N/A	``

2023

Title	Venue	Year	Code	Keywords
On the Privacy Risks of Algorithmic Recourse	AISTATS	2023	N/A	``
Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten	ICML	2023	N/A	``
Tracr: Compiled Transformers as a Laboratory for Interpretability	arXiv	2023	Github	`DeepMind`
Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse	ICLR	2023	N/A	``
Concept-level Debugging of Part-Prototype Networks	ICLR	2023	N/A	``
Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning	ICLR	2023	N/A	``
Re-calibrating Feature Attributions for Model Interpretation	ICLR	2023	N/A	``
Post-hoc Concept Bottleneck Models	ICLR	2023	N/A	``
Quantifying Memorization Across Neural Language Models	ICLR	2023	N/A	``
STREET: A Multi-Task Structured Reasoning and Explanation Benchmark	ICLR	2023	N/A	``
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification	CVPR	2023	N/A	``
EVAL: Explainable Video Anomaly Localization	CVPR	2023	N/A	``
Overlooked Factors in Concept-based Explanations: Dataset Choice, Concept Learnability, and Human Capability	CVPR	2023	Github	``
Spatial-Temporal Concept Based Explanation of 3D ConvNets	CVPR	2023	Github	``
Adversarial Counterfactual Visual Explanations	CVPR	2023	N/A	``
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification	CVPR	2023	N/A	``
Explaining Image Classifiers With Multiscale Directional Image Representation	CVPR	2023	N/A	``
CRAFT: Concept Recursive Activation FacTorization for Explainability	CVPR	2023	N/A	``
SketchXAI: A First Look at Explainability for Human Sketches	CVPR	2023	N/A	``
Don't Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis	CVPR	2023	N/A	``
Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning	CVPR	2023	N/A	``
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification	CVPR	2023	N/A	``
Interpretable Neural-Symbolic Concept Reasoning	ICML	2023	Github
Identifying Interpretable Subspaces in Image Representations	ICML	2023	N/A	``
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat	ICML	2023	N/A	``
Explainability as statistical inference	ICML	2023	N/A	``
On the Impact of Knowledge Distillation for Model Interpretability	ICML	2023	N/A	``
NA2Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning	ICML	2023	N/A	``
Explaining Reinforcement Learning with Shapley Values	ICML	2023	N/A	``
Explainable Data-Driven Optimization: From Context to Decision and Back Again	ICML	2023	N/A	``
Causal Proxy Models for Concept-based Model Explanations	ICML	2023	N/A	``
Learning Perturbations to Explain Time Series Predictions	ICML	2023	N/A	``
Rethinking Explaining Graph Neural Networks via Non-parametric Subgraph Matching	ICML	2023	N/A	``
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat	ICML	2023	Github	``
Representer Point Selection for Explaining Regularized High-dimensional Models	ICML	2023	N/A	``
Towards Explaining Distribution Shifts	ICML	2023	N/A	``
Relevant Walk Search for Explaining Graph Neural Networks	ICML	2023	Github	``
Concept-based Explanations for Out-of-Distribution Detectors	ICML	2023	N/A	``
GLOBE-CE: A Translation Based Approach for Global Counterfactual Explanations	ICML	2023	N/A	``
Robust Explanation for Free or At the Cost of Faithfulness	ICML	2023	N/A	``
Learn to Accumulate Evidence from All Training Samples: Theory and Practice	ICML	2023	N/A	``
Towards Trustworthy Explanation: On Causal Rationalization	ICML	2023	N/A	``
Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables	ICML	2023	N/A	``
Probabilistic Concept Bottleneck Models	ICML	2023	N/A	``
What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective	ICML	2023	N/A	``
Towards credible visual model interpretation with path attribution	ICML	2023	N/A	``
Trainability, Expressivity and Interpretability in Gated Neural ODEs	ICML	2023	N/A	``
Discover and Cure: Concept-aware Mitigation of Spurious Correlation	ICML	2023	N/A	``
PWSHAP: A Path-Wise Explanation Model for Targeted Variables	ICML	2023	N/A	``
A Closer Look at the Intervention Procedure of Concept Bottleneck Models	ICML	2023	N/A	``
Rethinking Interpretation: Input-Agnostic Saliency Mapping of Deep Visual Classifiers	AAAI	2023	N/A	``
TopicFM: Robust and Interpretable Topic-Assisted Feature Matching	AAAI	2023	N/A	``
Solving Explainability Queries with Quantification: The Case of Feature Relevancy	AAAI	2023	N/A	``
PEN: Prediction-Explanation Network to Forecast Stock Price Movement with Better Explainability	AAAI	2023	N/A	``
KerPrint: Local-Global Knowledge Graph Enhanced Diagnosis Prediction for Retrospective and Prospective Interpretations	AAAI	2023	N/A	``
Beyond Graph Convolutional Network: An Interpretable Regularizer-Centered Optimization Framework	AAAI	2023	N/A	``
Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling	AAAI	2023	N/A	``
Learning Interpretable Temporal Properties from Positive Examples Only	AAAI	2023	N/A	``
Symbolic Metamodels for Interpreting Black-Boxes Using Primitive Functions	AAAI	2023	N/A	``
Towards More Robust Interpretation via Local Gradient Alignment	AAAI	2023	N/A	``
Towards Fine-Grained Explainability for Heterogeneous Graph Neural Network	AAAI	2023	N/A	``
XClusters: Explainability-First Clustering	AAAI	2023	N/A	``
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis	AAAI	2023	N/A	``
Fairness and Explainability: Bridging the Gap towards Fair Model Explanations	AAAI	2023	N/A	``
Explaining Model Confidence Using Counterfactuals	AAAI	2023	N/A	``
SEAT: Stable and Explainable Attention	AAAI	2023	N/A	``
Factual and Informative Review Generation for Explainable Recommendation	AAAI	2023	N/A	``
Improving Interpretability via Explicit Word Interaction Graph Layer	AAAI	2023	N/A	``
Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing	AAAI	2023	N/A	``
Improving Interpretability of Deep Sequential Knowledge Tracing Models with Question-centric Cognitive Representations	AAAI	2023	N/A	``
Targeted Knowledge Infusion To Make Conversational AI Explainable and Safe	AAAI	2023	N/A	``
eForecaster: Unifying Electricity Forecasting with Robust, Flexible, and Explainable Machine Learning Algorithms	AAAI	2023	N/A	``
SolderNet: Towards Trustworthy Visual Inspection of Solder Joints in Electronics Manufacturing Using Explainable Artificial Intelligence	AAAI	2023	N/A	``
Xaitk-Saliency: An Open Source Explainable AI Toolkit for Saliency	AAAI	2023	N/A	``
Ripple: Concept-Based Interpretation for Raw Time Series Models in Education	AAAI	2023	N/A	``
Semantics, Ontology and Explanation	arXiv	2023	N/A	`Ontological Unpacking`
			N/A	``

gabrieleciravegna/explainable_ai_literature

Recent Publications in Explainable AI

2015

2016

2017

2018

2019

2020

2021

2022

2023