shap/shap

[Meta-issue] Notebooks are outdated / non-runnable

thatlittleboy opened this issue · 9 comments

Background

We should also make sure that our documentation is kept up to date.

A scour through the open issues in this repo and also on StackOverflow shows that the outdated documentation (or lackthereof) is causing confusion among our users.

Just some examples:


Plan

The plan here is to thoroughly go through each and every notebook example that we have, to:

  • Run the notebook from top-to-bottom and ensure there are no errors.
  • Update the prose where necessary to provide better clarity and fix any typos
  • Update the code (where necessary and appropriate) to demonstrate up-to-date API & python features, such as:
    • Replacing the deprecated boston dataset
    • Using the Explanation API for plotting (e.g. shap.plots.beeswarm rather than shap.summary_plot)

An example of such a PR: #3037

Useful info for contributors

To potential contributors (thank you in advance!), please stick to one notebook per PR when contributing.
Also, any help in updating the GPU-related notebooks would be very much appreciated.

We have two linting checks for notebooks, each of which has an "exclude list" that ignores notebooks that haven't been cleaned. When a notebook has been fixed up, it should be removed from the "exclude" list in these places:

  • The run_notebooks_timeout job in scripts/run_notebooks_timeouts.py
  • The nbcheckorder job in .pre-commit-config.yaml

For more details on how to preview the built documentation, see the contributing guide.

TODO

  • notebooks
    • text_examples
      • language_modelling
        • Language Modeling Explanation Demo.ipynb (#3691)
      • text_generation
        • Open Ended GPT2 Text Generation Explanations.ipynb
      • question_answering
        • Explaining a Question Answering Transformers Model.ipynb
      • translation
        • Machine Translation Explanations.ipynb
      • summarization
        • Abstractive Summarization Explanation Demo.ipynb
      • text_entailment
        • Textual Entailment Explanation Demo.ipynb
      • sentiment_analysis
        • Using custom functions and tokenizers.ipynb
        • Emotion classification multiclass example.ipynb
        • Positive vs. Negative Sentiment Classification.ipynb
        • Keras LSTM for IMDB Sentiment Classification.ipynb
    • image_examples
      • image_captioning
        • Image Captioning using Azure Cognitive Services.ipynb
        • Image Captioning using Open Source.ipynb
      • image_classification
        • Explain ResNet50 using the Partition explainer.ipynb
        • PyTorch Deep Explainer MNIST example.ipynb (#3591)
        • Explain MobilenetV2 using the Partition explainer (PyTorch).ipynb
        • Multi-class ResNet50 on ImageNet (TensorFlow)-checkpoint.ipynb
        • Front Page DeepExplainer MNIST Example.ipynb (#3393)
        • Multi-class ResNet50 on ImageNet (TensorFlow).ipynb
        • Multi-input Gradient Explainer MNIST Example.ipynb
        • Explain an Intermediate Layer of VGG16 on ImageNet (PyTorch).ipynb
        • Image Multi Class.ipynb
        • Explain an Intermediate Layer of VGG16 on ImageNet.ipynb
    • benchmarks
      • tabular
        • Tabular Prediction Benchmark Demo.ipynb (#3338)
        • Benchmark XGBoost explanations.ipynb (#3339)
      • others
        • Benchmark Debug Mode.ipynb
      • image
        • Image Multiclass Classification Benchmark Demo.ipynb
      • text
        • Machine Translation Benchmark Demo.ipynb
        • Abstractive Summarization Benchmark Demo.ipynb
        • Text Emotion Multiclass Classification Benchmark Demo.ipynb
    • overviews
      • Explaining quantitative measures of fairness.ipynb
      • Be careful when interpreting predictive models in search of causal insights.ipynb
      • An introduction to explainable AI with Shapley values.ipynb
    • genomic_examples
      • DeepExplainer Genomics Example.ipynb (#3458)
    • api_examples
      • maskers
        • custom.ipynb
      • explainers
        • Permutation.ipynb
        • Exact.ipynb
        • GPUTree.ipynb
      • plots
        • text.ipynb
        • decision_plot.ipynb
        • scatter.ipynb #3752
        • waterfall.ipynb
        • beeswarm.ipynb
        • violin.ipynb
        • image.ipynb
        • bar.ipynb #3523
        • heatmap.ipynb
    • tabular_examples
      • model_agnostic
        • Squashing Effect.ipynb
        • Census income classification with scikit-learn.ipynb
        • Simple Kernel SHAP.ipynb
        • Multioutput Regression SHAP.ipynb
        • Iris classification with scikit-learn.ipynb
        • Diabetes regression.ipynb
        • Simple Boston California Demo.ipynb (#3332)
      • tree_based_models
        • League of Legends Win Prediction with XGBoost.ipynb (#3275)
        • Understanding Tree SHAP for Simple Models.ipynb (#3278, #3749)
        • Census income classification with XGBoost.ipynb
        • Example of loading a custom tree model into SHAP.ipynb (#3304)
        • Census income classification with LightGBM.ipynb (#3303)
        • Perfomance Comparison.ipynb
        • NHANES I Survival Model.ipynb (#3395)
        • Python Version of Tree SHAP.ipynb (#3335)
        • Catboost tutorial.ipynb (#3214)
        • Scatter Density vs. Violin Plot Comparison.ipynb (#3396)
        • Explaining a simple OR function.ipynb (#3037)
        • Basic SHAP Interaction Value Example in XGBoost.ipynb (#3346)
        • Force Plot Colors.ipynb (#3336)
        • Fitting a Linear Simulation with XGBoost.ipynb
        • Front page example (XGBoost).ipynb (#3337)
        • Explaining the Loss of a Model.ipynb
      • neural_networks
        • Census income classification with Keras.ipynb
      • linear_models
        • Sentiment Analysis with Logistic Regression.ipynb (#3127)
        • Explaining a model that uses standardized features.ipynb (#3112)
        • Math behind LinearExplainer with correlation feature perturbation.ipynb

This is a great endeavour! It might be helpful to add the guidelines for notebooks (e.g. linters, style etc) in the contributing guide, perhaps under a general section about "things in need of attention"

Another idea which could support this endeavour could be to make use of some more Sphinx features to make the notebooks a bit more navigable, for example the Gallery view. There's a guide here:

https://docs.readthedocs.io/en/stable/guides/jupyter.html

+1 as I was trying to read the docs and many images were missing

@thatlittleboy I noticed that we seem to have two separate sets of notebooks used for documentation:

  • The /docs/notebooks directory of .ipynb notebooks are hosted on ReadTheDocs
  • The /notebooks directory of html pages is hosted at shap.github.io, for example here. Theses are referenced in the Readme for example.

Shall we aim to remove this duplication, putting everything in one place in one format?

znacer commented

Hi,
I tried to update a few notebooks, but I am stuck with an error from pre-commit check (Import block is un-sorted or un-formatted). I tried to fix it by re arranging imports but it didn't work. I would be glad if one could help on this (#3250).

Hi, I tried to update a few notebooks, but I am stuck with an error from pre-commit check (Import block is un-sorted or un-formatted). I tried to fix it by re arranging imports but it didn't work. I would be glad if one could help on this (#3250).

Have you installed pre-commit in your working environment? Have a look in the contributing guide for more detailed steps, and let us know if anything isn't clear.

Your specific issue seems to be because some files have not been auto-formatted correctly. You should be able to fix your issue by running:

pre-commit run all-files

Then, commit and push your changes as usual.

Nb. alternatively, you can run the ruff linter directly on a notebook with:

ruff check path/to/notebook.ipynb

Note that you need the latest ruff version installed to lint .ipynb files.

znacer commented

Thanks ! Managed to use ruff check notebook.ipynb --fix to correct the problem.

@thatlittleboy I noticed that we seem to have two separate sets of notebooks used for documentation:

* The `/docs/notebooks` directory of `.ipynb` notebooks are hosted on ReadTheDocs

* The `/notebooks` directory of `html` pages is hosted at `shap.github.io`, for example [here](https://shap.github.io/shap/notebooks/tree_explainer/Census%20income%20classification%20with%20LightGBM.html). Theses are referenced in the Readme for example.

Shall we aim to remove this duplication, putting everything in one place in one format?

@connortann Sorry I missed this, yea seems like a good idea. Let's leave it as a to-do.

Hope that Image Multiclass Classification Benchmark Demo.ipynb can be fixed soon. OVO