dsgibbons/shap

Separate out tests of optional dependencies

thatlittleboy opened this issue · 2 comments

Background

The origin of this issue is from #30 , and subsequent PR #45 to resolve that issue.

In the case of #30 , it's problematic because: shap should support numpy 1.24 (and indeed does, we merged PRs to remove deprecation warnings for that very purpose).
Yet because of tensorflow (an optional dependency), we are forced to test in an environment with numpy 1.23.5. I.e., we might miss bugs that only appear in the higher package versions.

Put very simply:

  • In our run_tests Actions pipeline, we are 1) pip installing ALL test dependencies (including optional dependencies like tensorflow, pytorch, catboost, etc.), then 2) running pytest to run all tests.
  • Because of 1), it translates to the fact that we are only testing in an environment which fulfils all of the optional requirements. This is not representative of how end users might use shap.
  • The particular instance where this becomes an issue is described in #30 and the excerpt above.

Solutions

Would like to discuss some potential solutions. Here's one idea, just throwing it out there:

  • We will split out the testing of optional dependencies' code & shap functionality into different steps within the same run_tests Actions pipeline.
  • I'm proposing to keep the tests within the same Stage so that we get to reuse as much installed dependencies as we can. Alternatively, if we have implemented caching of package installations, then it's a non-issue.
  • Then, we could do something like: pip install -e '.[test-core]', python -m pytest -m core to run all the core tests. Of course, the pre-requisite here is that we decorate all test functions relying only on core-functionality with @pytest.mark.core.
  • Once that passes, then we go on with pip install -e '.[test-torch]', python -m pytest -m torch` to run all pytorch related tests. Etc.etc.

Obviously this comes with a lot of downsides, 1) it complicates the testing setup and procedure in the local environment -- we would probably need to rely on scripts (either with pdm/poetry , Makefiles, or the good-ol' bash scripts) then, 2) potentially lengthens the testing CI runtime, and 3) all or most of our test functions will need to be decorated with pytest markers, which is a huge change to make.

So I'm not thrilled with this idea, but it does make our testing a bit more rigorous. Hence, I would say this issue is a low priority, but I'm really curious how other projects are dealing with this similar problem..

Here's an alternative idea that could keep the testing setup simple. We could add another stage to run the full test suite with the latest core dependencies only. The stage could run in parallel to the existing tests, to keep things performant.

Most tests use pytest.importorskip to import optional dependencies, e.g.:

tf = pytest.importorskip('tensorflow')

The tests in question will simply be skipped if the dependency isn't available, so we wouldn't need to annotate each test with pytest markers.

I tried out implementing a separate stage as described above in #61, which seems to work as I hoped:

  • Installing just the core dependencies mean we test against their latest versions, e.g. numpy 1.24
  • Tests which require non-core dependencies are skipped without needing pytest markers
  • The new job runs in about 4 minutes, so finishes well before the other jobs that run in parallel.

Open to any other ideas though if there are alternative solutions!