Book: Interpretable Machine Learning
- LIME-``Why Should I Trust You?" Explaining the Predictions of Any Classifier
- SHAP-A Unified Approach to Interpreting Model Predictions
- Anchors: High-Precision Model-Agnostic Explanations
- How to Explain Individual Classification Decisions
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
- SmoothGrad: removing noise by adding noise
- Axiomatic Attribution for Deep Networks
- Towards better understanding of gradient-based attribution methods for Deep Neural Networks
- Contextual Prediction Difference Analysis for Explaining Individual Image Classifications
- Learning important features through propagating activation differences
- Learning Deep Features for Discriminative Localization
- Examples are not enough, learn to criticize! Criticism for Interpretability
- Representer Point Selection for Explaining Deep Neural Networks
- Evaluating Explanations: How much do explanations from the teacher aid students?
- Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR
- Interpretable Counterfactual Explanations Guided by Prototypes
- Explaining machine learning classifiers through diverse counterfactual explanations
- Model-agnostic counterfactual explanations for consequential decisions
- SP-LIME-``Why Should I Trust You?" Explaining the Predictions of Any Classifier
- Network Dissection: Quantifying Interpretability of Deep Visual Representations
- Compositional Explanations of Neurons
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
- Learning Global Additive Explanations for Neutal Nets Using Model Distillation
- Interpreting Blackbox Models via Model Extraction
- Faithful and Customizable Explanations of Black Box Models
- Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses
- Matthews 2019 (didnt find)
- Understanding Black-box Predictions via Influence Functions (Prototype based explantions)
- INTERPRETABLE CLASSIFIERS USING RULES AND BAYESIAN ANALYSIS: BUILDING A BETTER STROKE PREDICTION MODEL (Rule based)
- Interpretable Decision Sets: A Joint Framework for Description and Prediction (Rule based)
- Avoiding Disparate Impact with Counterfactual Distributions (Counterfactual)
- None
- Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions (Prototypes)
- Towards A Rigorous Science of Interpretable Machine Learning
- An Evaluation of the Human-Interpretability of Explanation
- Visualizing Deep Networks by Optimizing with Integrated Gradients
- Data Shapley: Equitable Valuation of Data for Machine Learning
- Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
- Manipulating and Measuring Model Interpretability
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier
- FIND: Human-in-the-Loop Debugging Deep Text Classifiers
- Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications
- On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection
- Teaching Categories to Human Learners with Visual Explanations
- LOCAL EXPLANATION METHODS FOR DEEP NEURAL NETWORKS LACK SENSITIVITY TO PARAMETER VALUES
- When Explanations Lie: Why Many Modified BP Attributions Fail
- Explanations can be manipulated and geometry is to blame
- Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
- Interpretation of Neural Networks is Fragile
- Fairwashing Explanations with Off-Manifold Detergent (Against manipulation)
- On the Robustness of Interpretability Methods
- On the (In)fidelity and Sensitivity for Explanations
- SAM: The Sensitivity of Attribution Methods to Hyperparameters
- Semantically Equivalent Adversarial Rules for Debugging NLP Models
- Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge