Opinionated list of resources facilitating model interpretability (introspection, simplification, visualization, explanation).
- Interpretable models
- Simple decision trees
- Rules
- (Regularized) linear regression
- k-NN
- Interpretable Classifiers Using Rules and Bayesian Analysis: Building a Better Stroke Prediction Model https://arxiv.org/pdf/1511.01644.pdf
- Comprehensible Classification Models – a position paper by Alex A. Freitas
- Interesting discussion of interpretability for a few classification models (decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifier)
- http://www.kdd.org/exploration_files/V15-01-01-Freitas.pdf
- Models offering feature importance measures
- Random forest
- Boosted trees
- Extremely randomized trees https://doi.org/10.1007/s10994-006-6226-1
- Linear regression (with a grain of salt)
- Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the “Rashomon” Perspective
- Universal (model agnostic) variable importance measure
- https://arxiv.org/pdf/1801.01489
- https://github.com/aaronjfisher/mcr
- Bias in random forest variable importance measures: Illustrations, sources and a solution https://doi.org/10.1186/1471-2105-8-25
- Conditional variable importance for random forests https://doi.org/10.1186/1471-2105-9-307
- Very good blog post describing deficiencies of random forest feature importance and the permutation importance
- Permutation importance - simple model agnostic approach is described in Eli5 documentation
- Classification of feature selection methods
- Filters
- Wrappers
- Embedded methods
- Feature Engineering and Selection by Kuhn & Johnson
- Sligtly off-topic, but very interesting book
- http://www.feat.engineering/index.html
- https://bookdown.org/max/FES/
- https://github.com/topepo/FES
- Magnets by R. P. Feynman https://www.youtube.com/watch?v=wMFPe-DwULM
- The Mythos of Model Interpretability
- The Promise and Peril of Human Evaluation for Model Interpretability
- Towards Rigorous Science of Model Interpretability https://arxiv.org/pdf/1702.08608
- The Book of Why: The New Science of Cause and Effect by Judea Pearl
- Looking Inside the Black Box, presentation of Leo Breiman
- LIME (Local Interpretable Model-agnostic Explanations)
- SHAP (SHapley Additive exPlanations), generalizing LIME
- Anchors: High-Precision Model-Agnostic Explanations, another improvement over LIME
- Explanations of Model Predictions with live and breakDown Package
- Model Explanation System by Ryan Turner
- Understanding Black-box Predictions via Influence Functions
- A review book - Interpretable Machine Learning. A Guide for Making Black Box Models Explainable by Christoph Molnar
- Visualizing and Understanding Convolutional Networks
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
- Understanding Neural Networks Through Deep Visualization
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
- Generating Visual Explanations
- Rationalizng Neural Network Predictions
- Pixel entropy can be used to detect relevant picture regions (for CovNets)
- See Visualization section and Fig. 5 of the paper
High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks
- See Visualization section and Fig. 5 of the paper
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
- Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks
- Axiomatic Attribution for Deep Networks
- Proposes Integrated Gradients Method
- https://arxiv.org/pdf/1703.01365.pdf
- Code: https://github.com/ankurtaly/Integrated-Gradients
- See also: Gradients of Counterfactuals https://arxiv.org/pdf/1611.02639.pdf
- Learning Important Features Through Propagating Activation Differences
- Proposes Deep Lift method
- https://arxiv.org/pdf/1704.02685.pdf
- Code: https://github.com/kundajelab/deeplift
- Videos: https://www.youtube.com/playlist?list=PLJLjQOkqSRTP3cLB2cOOi_bQFw6KPGKML
- The (Un)reliability of saliency methods
- Review of failures for methods extracting most important pixels for prediction
- https://arxiv.org/pdf/1711.00867.pdf
- Classifier-agnostic Saliency Map Extraction
- The Building Blocks of Interpretability
- https://distill.pub/2018/building-blocks
- Has some embeded links to notebooks
- Uses Lucid library https://github.com/tensorflow/lucid
- Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples
- Distilling a Neural Network Into a Soft Decision Tree
- Visualizing Statistical Models: Removing the blindfold
- Partial dependence plots
- http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html
- pdp: An R Package for Constructing Partial Dependence Plots https://journal.r-project.org/archive/2017/RJ-2017-016/RJ-2017-016.pdf https://cran.r-project.org/web/packages/pdp/index.html
- ggfortify: Unified Interface to Visualize Statistical Results of Popular R Packages
- RandomForestExplainer
- ggRandomForest
- Tutorial on Interpretable machine learning at ICML 2017
- P. Biecek, Show Me Your Model tools for visualisation of statistical models
- S. Ritchie, Just-So Stories of AI
- C. Jarmul, Towards Interpretable Accountable Models
- I. Oszvald, Machine Learning Libraries You’d Wish You’d Known About
- A large part of the talk covers model explanation and visualization
- Video: https://www.youtube.com/watch?v=nDF7_8FOhpI
- Associated notebook on explaining regression predictions: https://github.com/ianozsvald/data_science_delivered/blob/master/ml_explain_regression_prediction.ipynb
- G. Varoquaux, Understanding and diagnosing your machine-learning models (covers PDP and Lime among others)
- Interpretable ML Symposium (NIPS 2017) (contains links to papers, slides and videos)
- http://interpretable.ml/
- Debate, Interpretability is necessary in machine learning
- 2017 Workshop on Human Interpretability in Machine Learning (WHI) (in conjunction with ICML 2017) (contains links to papers and slides)
Software related to papers is mentioned along with each publication. Here only standalone software is included.
- ELI5 - Python package dedicated to debugging machine learning classifiers and explaining their predictions
- yellowbrick - visual analysis and diagnostic tools to facilitate machine learning model selection
- lime - R package implementing LIME
- forestmodel - R package visualizing coefficients of different models with the so called forest plot
- DALEX - Descriptive mAchine Learning EXplanations
- Lucid - a collection of infrastructure and tools for research in neural network interpretability