Fix notebook-style examples
GaelVaroquaux opened this issue · 47 comments
Describe the issue linked to the documentation
Many legitimate notebook style examples have been broken, and specifically by the following PR
#9061
List of examples to update
Note for maintainers: the content between begin/end_auto_generated is updated automatically by a script. If you edit it by hand your changes may be reverted. The script doing the update: https://gist.github.com/lesteve/478d52599d394ec5e7f56dbf0827a5e9
Here are all the examples that use patterns like # #######
(found by ag -l '# #####*\s#' examples | sort
, note there may be false positives ... for I removed examples/impute/plot_missing_values.py which is using # %%
but also # ####
as title underlines ...):
begin_auto_generated
- examples/applications/plot_prediction_latency.py #22418
- examples/applications/plot_stock_market.py #22461
- examples/applications/wikipedia_principal_eigenvector.py #22704
- examples/calibration/plot_calibration.py #22734
- examples/classification/plot_lda_qda.py #22528
- examples/cluster/plot_affinity_propagation.py #22559
- examples/cluster/plot_coin_ward_segmentation.py #23164
- examples/cluster/plot_dbscan.py #22568
- examples/cluster/plot_dict_face_patches.py #22929
- examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py #22796
- examples/cluster/plot_mean_shift.py #22713
- examples/cluster/plot_mini_batch_kmeans.py #22900
- examples/cluster/plot_segmentation_toy.py #23140
- examples/cluster/plot_ward_structured_vs_unstructured.py #23228
- examples/covariance/plot_covariance_estimation.py #23150
- examples/covariance/plot_sparse_cov.py #22807
- examples/cross_decomposition/plot_compare_cross_decomposition.py #23365
- examples/decomposition/plot_faces_decomposition.py #22452
- examples/decomposition/plot_ica_blind_source_separation.py #23365
- examples/decomposition/plot_ica_vs_pca.py #23106
- examples/decomposition/plot_image_denoising.py #22739
- examples/decomposition/plot_pca_3d.py #23064
- examples/decomposition/plot_pca_vs_fa_model_selection.py #23148
- examples/exercises/plot_cv_diabetes.py #22740
- examples/feature_selection/plot_feature_selection.py #22437
- examples/linear_model/plot_ard.py #22481
- examples/linear_model/plot_bayesian_ridge_curvefit.py #22916
- examples/linear_model/plot_bayesian_ridge.py #22916 #22794
- examples/linear_model/plot_lasso_and_elasticnet.py #22423
- examples/linear_model/plot_lasso_dense_vs_sparse_data.py #22789
- examples/linear_model/plot_logistic_path.py #22536
- examples/linear_model/plot_multi_task_lasso_support.py #23365
- examples/linear_model/plot_ols_3d.py #22547
- examples/linear_model/plot_ridge_path.py #23209
- examples/linear_model/plot_theilsen.py #23002
- examples/miscellaneous/plot_kernel_ridge_regression.py #22804
- examples/model_selection/grid_search_text_feature_extraction.py #22558
- examples/model_selection/plot_roc_crossval.py #22799
- examples/model_selection/plot_train_error_vs_test_error.py #22440
- examples/neighbors/plot_regression.py #22416
- examples/neural_networks/plot_rbm_logistic_classification.py #23104
- examples/semi_supervised/plot_label_propagation_digits.py #22725
- examples/semi_supervised/plot_label_propagation_structure.py #22726
- examples/svm/plot_rbf_parameters.py #22724
- examples/svm/plot_svm_anova.py #22779
- examples/svm/plot_svm_regression.py #22534
- examples/text/plot_document_clustering.py #22443
end_auto_generated
Suggest a potential alternative/fix
The examples need to be reviewed on a case by case, to know whether they are "notebook-syle", as in https://sphinx-gallery.github.io/stable/tutorials/plot_parse.html#sphx-glr-tutorials-plot-parse-py or not. In general, we should favor notebook-style examples, which are typically more readable.
We should probably favor the "# %%" syntax to the long line of "###"
https://sphinx-gallery.github.io/stable/syntax.html#embed-rst-in-your-example-python-files
In terms of reviewing workflow, it can be useful to to send different PRs, rather than a big PR which will be harder to review.
Pieces of advice if you are interested in working on this issue
- pick a file to get started with: e.g. https://github.com/scikit-learn/scikit-learn/blob/main/examples/semi_supervised/plot_label_propagation_digits.py with its rendered HTML in the doc
- Mention which file you are working in a comment so that different people can work in parallel on different files
- use
# %%
as cell separator where you think it is appropriate. For example in https://github.com/scikit-learn/scikit-learn/blob/main/examples/semi_supervised/plot_label_propagation_digits.py I would say that some comments may not have their own cell like "Pick the top 10 most uncertain labels" whereas some should be turned into a title like "Plot". - Look at existing notebook-style example for inspiration e.g. https://github.com/scikit-learn/scikit-learn/blob/main/examples/manifold/plot_swissroll.py with its rendered HTML in the doc
- Look at the contributing doc to build the example HTML locally: https://scikit-learn.org/stable/developers/contributing.html in particular using
EXAMPLES_PATTERN=plot_label_propagation_digit make html
will only run the examples withplot_label_propagation_digit
in their filenames. This makes it quicker to generate the doc for only the example you are working on and look at the HTML rendering locally. - once you create a PR, check the CircleCI status at the bottom of the page and click on "Details"
- It should bring you to a page like this, click on the name of the example you modify and check that the doc for this example looks right:
Working on this
I have added a list of examples that uses a pattern like # #####
for comments. I think it is best to create one PR by example, this will make it easier to review changes.
@GaelVaroquaux Do you mean that we wish to standardize (# %%
(preferred), or # ##
) the block splitter across all relevant scripts?
/take
@GaelVaroquaux Make sense! How would you recommend that I help with this?
@ss-is-master-chief I have added a "Pieces of advice if you are interested in working on this issue" section in the top post to help you (and potentially other interested people to get started). If something is not clear let us know!
I want to help.
Working on examples/applications/plot_stock_market.py
Working on examples/linear_model/plot_ard.py
Working on examples/classification/plot_lda_qda.py
working on examples/svm/plot_svm_regression.py
working on examples/svm/plot_svm_anova.py
Working on examples/linear_model/plot_ols_3d.py
Working on examples/cluster/plot_dbscan.py
Working on examples/cluster/plot_mean_shift.py
Working on wikipedia_principal_eigenvector.py
Working on examples/svm/plot_rbf_parameters.py
working on examples/semi_supervised/plot_label_propagation_digits.py and examples/semi_supervised/plot_label_propagation_structure.py
Working on plot_calibration.py
Working on plot_image_denoising.py
Working on plot_cv_diabetes.py
Working on plot_svm_anova.py
Working on linear_model/plot_lasso_dense_vs_sparse_data.py
I am planning to work on examples/linear_model/plot_bayesian_ridge.py
Starting to work on examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py
Starting to work on examples/miscellaneous/plot_kernel_ridge_regression.py
Starting to work on examples/model_selection/plot_roc_crossval.py
Working on examples/covariance/plot_sparse_cov.py .
Working on examples/cluster/plot_mini_batch_kmeans.py
Working on examples/cluster/plot_coin_ward_segmentation.py
Working on examples/cluster/plot_dict_face_patches.py
Working on examples/linear_model/plot_ridge_path.py
Working on examples/linear_model/plot_theilsen.py
Working on examples/cluster/plot_segmentation_toy.py