Improve test suite execution speed
connortann opened this issue · 2 comments
I think there are a few areas for improvement in the GitHub test suite that we could address to improve the execution speed. Currently the unit tests take almost 20 minutes to run on CI. If we could reduce that it could help reduce the time it takes to validate PRs, improving our effectiveness as reviewers.
TODO
- Refactor the slowest tests to run faster (see below)
- Cache the installation of the dependencies, which currently take several minutes from each test job.
- Investigate parallelising the tests to run on multiple CPUs.
- WIP attempt #41, abandoned as it seems to actually make the tests run even slower on CI.
- Get CodeCov working, to help us evaluate relevant test coverage when reviewing PRs.
- Needs test suite to be passing, as per #4
Slowest tests
[Updated] here are the current set of slowest tests:
============================= slowest 20 durations =============================
55.60s call tests/explainers/test_partition.py::test_translation
48.50s call tests/explainers/test_partition.py::test_translation_auto
48.44s call tests/explainers/test_partition.py::test_translation_algorithm_arg
47.71s call tests/explainers/test_partition.py::test_serialization
46.61s call tests/explainers/test_partition.py::test_serialization_custom_model_save
43.77s call tests/explainers/test_partition.py::test_serialization_no_model_or_masker
40.41s call tests/explainers/test_gradient.py::test_pytorch_mnist_cnn
Caching dependencies: experiment notes
Comparison of various options I've tried for caching dependencies.
Nb. we can see & manage caches via UI: https://github.com/dsgibbons/shap/actions/caches
Repository caches limited to 10GB.
Timings
Env | Baseline | 1: Cache pip | 2: Cache whole env | 3: Cache some libs |
---|---|---|---|---|
py3.7 | 4m 14s | 1m 34s | 3m 15s | |
py3.8 | 5m 6s | 1m 50s | 3m 4s | |
py3.9 | 4m 25s | 4m 34s | 2m 25 | 2m 56s |
py3.10 | 4m 30s | 4m 41s | 1m 44s | 2m 51s |
py3.11 | 4m 42s | 5m 17s | 2m 42s | 2m 51s |
Average | 4m 35s | 4m 50s | 2m 3s | 3m 35s |
Approaches
0. Baseline
Existing approach, just pip-install with no caching.
1. Enable cache in the setup python action.
Caches the wheels, but not the installed environment. As per the action docs.
- Implementation
- Example run
- Result: 15s slower
2. Cache the whole python env
As per this blog
- Env cache is ~3GB, so this exceeds the 10GB limit for all envs
- Example run
- Implementation
- Result: ~2X speedup, but only space to cache 3 of 5 envs.
3. Cache specific libraries in site-packages
Cache only the libraries which need to be built, such as pyspark. Leave other libs to be pip-installed as before
To decide which packages to cache: we want to save the most time, whilst keeping under ~2GB total cache size per env. Some calculations from experimentation, sorted by those that save the most time for the least space:
Package | size (MB) | built time (s) | s / MB |
---|---|---|---|
site-packages/pyspark* | 310 | 12s | 0.039 |
site-packages/nvidia* | 1521 | 40s | 0.026 |
site-packages/torch* | 619 | 13s | 0.021 |
site-packages/tensorflow* | 586 | 12s | 0.020 |
site-packages/xgboost* | 200 | 4s | 0.020 |
So, decide to cache just the first 3 libraries. In future if we drop support for any python versions, we can cache more libraries.
Implementing options on PR #84 .
Ported to shap#3045