data-apis/scipy-2023-presentation

Motivating example

asmeurer opened this issue · 3 comments

Do we have a good motivating example for the talk/paper? I know we have @AnirudhDagar's scipy demo scipy/scipy@main...AnirudhDagar:scipy:array-api-demo as well as @thomasjpfan's scikit-learn PR https://github.com/scikit-learn/scikit-learn/pull/22554/files. I could crib some relevant parts from the diff(s) there. Or should we come up with a standalone script that does something? Some good things to show in the example would be:

  • That the majority of NumPy-like code will remain unchanged (other than np -> xp).
  • Use of array_namespace at the top of the function.
  • Some functions are renamed (e.g., concat -> concatenate).
  • Some functions aren't included and have to be worked around.
  • Some NumPy behaviors aren't guaranteed in the spec so should be written in a more portable way (e.g., explicitly indexing every axis, avoiding implicit cross-kind casting, not passing Python scalars to functions, not using int dtypes for floating-point functions).
  • Some libraries may need to be special-cased for performance purposes.

I can demonstrate all of these using the above scipy and scikit-learn PRs. So it's a question of whether it's better to show the actual real world usage, or if it's better to make the example more coherent and self-contained.

And we'll definitely mention scipy and scikit-learn efforts later regardless of the example we choose.

For talks, I usually try to start with "something interesting enabled by the new tech" to get everyone excited. For this case, "Look at all the benefits SciPy and scikit-learn have from using Array API". For scikit-learn, we have docs on ArrayAPI usage, and this benchmark notebook to show performance benefits.

Afterwards, one can dive into the implementation details for getting Array API to work in the attendee's projects. For implementation details, I think it's better to be more coherent and self-contained. This can include real world usage if it is self-contained enough.

By the way if you have any benchmarks that are graphs or anything else with a nice figure we can include that would be useful.

I reran benchmarks using scikit-learn that compared CuPy, PyTorch+GPU, PyTorch+CPU, and NumPy in this gist containing a notebook and a CSV file with the results.