[Meta-issue] Notebooks are outdated / non-runnable
thatlittleboy opened this issue · 9 comments
Background
We should also make sure that our documentation is kept up to date.
A scour through the open issues in this repo and also on StackOverflow shows that the outdated documentation (or lackthereof) is causing confusion among our users.
Just some examples:
Plan
The plan here is to thoroughly go through each and every notebook example that we have, to:
- Run the notebook from top-to-bottom and ensure there are no errors.
- Update the prose where necessary to provide better clarity and fix any typos
- Update the code (where necessary and appropriate) to demonstrate up-to-date API & python features, such as:
- Replacing the deprecated boston dataset
- Using the
Explanation
API for plotting (e.g.shap.plots.beeswarm
rather thanshap.summary_plot
)
An example of such a PR: #3037
Useful info for contributors
To potential contributors (thank you in advance!), please stick to one notebook per PR when contributing.
Also, any help in updating the GPU-related notebooks would be very much appreciated.
We have two linting checks for notebooks, each of which has an "exclude list" that ignores notebooks that haven't been cleaned. When a notebook has been fixed up, it should be removed from the "exclude" list in these places:
- The
run_notebooks_timeout
job inscripts/run_notebooks_timeouts.py
- The
nbcheckorder
job in.pre-commit-config.yaml
For more details on how to preview the built documentation, see the contributing guide.
TODO
- notebooks
- text_examples
- language_modelling
- Language Modeling Explanation Demo.ipynb (#3691)
- text_generation
- Open Ended GPT2 Text Generation Explanations.ipynb
- question_answering
- Explaining a Question Answering Transformers Model.ipynb
- translation
- Machine Translation Explanations.ipynb
- summarization
- Abstractive Summarization Explanation Demo.ipynb
- text_entailment
- Textual Entailment Explanation Demo.ipynb
- sentiment_analysis
- Using custom functions and tokenizers.ipynb
- Emotion classification multiclass example.ipynb
- Positive vs. Negative Sentiment Classification.ipynb
- Keras LSTM for IMDB Sentiment Classification.ipynb
- language_modelling
- image_examples
- image_captioning
- Image Captioning using Azure Cognitive Services.ipynb
- Image Captioning using Open Source.ipynb
- image_classification
- Explain ResNet50 using the Partition explainer.ipynb
- PyTorch Deep Explainer MNIST example.ipynb (#3591)
- Explain MobilenetV2 using the Partition explainer (PyTorch).ipynb
- Multi-class ResNet50 on ImageNet (TensorFlow)-checkpoint.ipynb
- Front Page DeepExplainer MNIST Example.ipynb (#3393)
- Multi-class ResNet50 on ImageNet (TensorFlow).ipynb
- Multi-input Gradient Explainer MNIST Example.ipynb
- Explain an Intermediate Layer of VGG16 on ImageNet (PyTorch).ipynb
- Image Multi Class.ipynb
- Explain an Intermediate Layer of VGG16 on ImageNet.ipynb
- image_captioning
- benchmarks
- tabular
- others
- Benchmark Debug Mode.ipynb
- image
- Image Multiclass Classification Benchmark Demo.ipynb
- text
- Machine Translation Benchmark Demo.ipynb
- Abstractive Summarization Benchmark Demo.ipynb
- Text Emotion Multiclass Classification Benchmark Demo.ipynb
- overviews
- Explaining quantitative measures of fairness.ipynb
- Be careful when interpreting predictive models in search of causal insights.ipynb
- An introduction to explainable AI with Shapley values.ipynb
- genomic_examples
- DeepExplainer Genomics Example.ipynb (#3458)
- api_examples
- tabular_examples
- model_agnostic
- Squashing Effect.ipynb
- Census income classification with scikit-learn.ipynb
- Simple Kernel SHAP.ipynb
- Multioutput Regression SHAP.ipynb
- Iris classification with scikit-learn.ipynb
- Diabetes regression.ipynb
- Simple
BostonCalifornia Demo.ipynb (#3332)
- tree_based_models
- League of Legends Win Prediction with XGBoost.ipynb (#3275)
- Understanding Tree SHAP for Simple Models.ipynb (#3278, #3749)
- Census income classification with XGBoost.ipynb
- Example of loading a custom tree model into SHAP.ipynb (#3304)
- Census income classification with LightGBM.ipynb (#3303)
- Perfomance Comparison.ipynb
- NHANES I Survival Model.ipynb (#3395)
- Python Version of Tree SHAP.ipynb (#3335)
- Catboost tutorial.ipynb (#3214)
- Scatter Density vs. Violin Plot Comparison.ipynb (#3396)
- Explaining a simple OR function.ipynb (#3037)
- Basic SHAP Interaction Value Example in XGBoost.ipynb (#3346)
- Force Plot Colors.ipynb (#3336)
- Fitting a Linear Simulation with XGBoost.ipynb
- Front page example (XGBoost).ipynb (#3337)
- Explaining the Loss of a Model.ipynb
- neural_networks
- Census income classification with Keras.ipynb
- linear_models
- model_agnostic
- text_examples
This is a great endeavour! It might be helpful to add the guidelines for notebooks (e.g. linters, style etc) in the contributing guide, perhaps under a general section about "things in need of attention"
Another idea which could support this endeavour could be to make use of some more Sphinx features to make the notebooks a bit more navigable, for example the Gallery view. There's a guide here:
+1 as I was trying to read the docs and many images were missing
@thatlittleboy I noticed that we seem to have two separate sets of notebooks used for documentation:
- The
/docs/notebooks
directory of.ipynb
notebooks are hosted on ReadTheDocs - The
/notebooks
directory ofhtml
pages is hosted atshap.github.io
, for example here. Theses are referenced in the Readme for example.
Shall we aim to remove this duplication, putting everything in one place in one format?
Hi,
I tried to update a few notebooks, but I am stuck with an error from pre-commit check (Import block is un-sorted or un-formatted
). I tried to fix it by re arranging imports but it didn't work. I would be glad if one could help on this (#3250).
Hi, I tried to update a few notebooks, but I am stuck with an error from pre-commit check (
Import block is un-sorted or un-formatted
). I tried to fix it by re arranging imports but it didn't work. I would be glad if one could help on this (#3250).
Have you installed pre-commit in your working environment? Have a look in the contributing guide for more detailed steps, and let us know if anything isn't clear.
Your specific issue seems to be because some files have not been auto-formatted correctly. You should be able to fix your issue by running:
pre-commit run all-files
Then, commit and push your changes as usual.
Nb. alternatively, you can run the ruff linter directly on a notebook with:
ruff check path/to/notebook.ipynb
Note that you need the latest ruff version installed to lint .ipynb
files.
Thanks ! Managed to use ruff check notebook.ipynb --fix
to correct the problem.
@thatlittleboy I noticed that we seem to have two separate sets of notebooks used for documentation:
* The `/docs/notebooks` directory of `.ipynb` notebooks are hosted on ReadTheDocs * The `/notebooks` directory of `html` pages is hosted at `shap.github.io`, for example [here](https://shap.github.io/shap/notebooks/tree_explainer/Census%20income%20classification%20with%20LightGBM.html). Theses are referenced in the Readme for example.
Shall we aim to remove this duplication, putting everything in one place in one format?
@connortann Sorry I missed this, yea seems like a good idea. Let's leave it as a to-do.
Hope that Image Multiclass Classification Benchmark Demo.ipynb can be fixed soon. OVO