dsgibbons/shap

Improve test suite execution speed

connortann opened this issue · 2 comments

I think there are a few areas for improvement in the GitHub test suite that we could address to improve the execution speed. Currently the unit tests take almost 20 minutes to run on CI. If we could reduce that it could help reduce the time it takes to validate PRs, improving our effectiveness as reviewers.

TODO

  • Refactor the slowest tests to run faster (see below)
  • Cache the installation of the dependencies, which currently take several minutes from each test job.
    • A prerequisite is having producible environments, as per #30.
    • Implemented in #84 , subject to cache total size limit.
  • Investigate parallelising the tests to run on multiple CPUs.
    • WIP attempt #41, abandoned as it seems to actually make the tests run even slower on CI.
  • Get CodeCov working, to help us evaluate relevant test coverage when reviewing PRs.
    • Needs test suite to be passing, as per #4

Slowest tests

[Updated] here are the current set of slowest tests:

============================= slowest 20 durations =============================
55.60s call     tests/explainers/test_partition.py::test_translation
48.50s call     tests/explainers/test_partition.py::test_translation_auto
48.44s call     tests/explainers/test_partition.py::test_translation_algorithm_arg
47.71s call     tests/explainers/test_partition.py::test_serialization
46.61s call     tests/explainers/test_partition.py::test_serialization_custom_model_save
43.77s call     tests/explainers/test_partition.py::test_serialization_no_model_or_masker
40.41s call     tests/explainers/test_gradient.py::test_pytorch_mnist_cnn

Caching dependencies: experiment notes

Comparison of various options I've tried for caching dependencies.

Nb. we can see & manage caches via UI: https://github.com/dsgibbons/shap/actions/caches

Repository caches limited to 10GB.

Timings

Env Baseline 1: Cache pip 2: Cache whole env 3: Cache some libs
py3.7 4m 14s 1m 34s 3m 15s
py3.8 5m 6s 1m 50s 3m 4s
py3.9 4m 25s 4m 34s 2m 25 2m 56s
py3.10 4m 30s 4m 41s 1m 44s 2m 51s
py3.11 4m 42s 5m 17s 2m 42s 2m 51s
Average 4m 35s 4m 50s 2m 3s 3m 35s

Approaches

0. Baseline

Existing approach, just pip-install with no caching.

1. Enable cache in the setup python action.

Caches the wheels, but not the installed environment. As per the action docs.

2. Cache the whole python env

As per this blog

  • Env cache is ~3GB, so this exceeds the 10GB limit for all envs
  • Example run
  • Implementation
  • Result: ~2X speedup, but only space to cache 3 of 5 envs.

3. Cache specific libraries in site-packages

Cache only the libraries which need to be built, such as pyspark. Leave other libs to be pip-installed as before

To decide which packages to cache: we want to save the most time, whilst keeping under ~2GB total cache size per env. Some calculations from experimentation, sorted by those that save the most time for the least space:

Package size (MB) built time (s) s / MB
site-packages/pyspark* 310 12s 0.039
site-packages/nvidia* 1521 40s 0.026
site-packages/torch* 619 13s 0.021
site-packages/tensorflow* 586 12s 0.020
site-packages/xgboost* 200 4s 0.020

So, decide to cache just the first 3 libraries. In future if we drop support for any python versions, we can cache more libraries.

Implementing options on PR #84 .

Ported to shap#3045