thebjorn/pydeps

Question: has anybody used pydeps for selective testing?

Closed this issue · 3 comments

Hey all,

I'm considering using pydeps for selective testing on pull requests. The idea would be to compute the dependency graph, and only run tests for the modules and their dependencies which have changed compared to master.

A few false negatives are OK, it's fine for it just to work pretty well, most of the time.

Has anybody tried this? I searched around but couldn't find anything...

kinow commented

That sounds like an interesting idea @harrybiddle . I think the traditional approach for selective testing is to use coverage. But running tests based on the dependency graph could be useful too.

Maybe that could be an extension to pytest-testmon, or another pytest- module.

I think one downside of this approach would be if you have modules that are complex, with lots of code, a small change could trigger multiple tests. Which could also serve as an indicator to reduce the complexity of that module? But you already said it's OK to work well most of the time, so that should be OK.

Just my 0.02 cents 👍

We have an internal tool that tries to create the inter-package dependency graph, which is what you need to do this.

It uses pydeps --externals app to find the external modules that app imports (bacon==1). E.g. for our dksync app:

(dev) go|c:\srv\lib\code\dksync> pydeps --externals dksync
[
    "builtins",
    "ctypes",
    "dkfileutils",
    "dkrepo",
    "dkrunners",
    "multiprocessing_logging",
    "past",
    "pkg_resources",
    "screen",
    "setuptools",
    "yaml"
]

It then creates dependencies:

    builtins -> dksync
    ctypes -> dksync
    dkfileutils -> dksync
    etc.

having collected dependencies for all packages, any package that ever is on the right hand side is a "local" package, everything else is a "remote" package. Removing all relations containing "remote" packages leaves you with the dependency graph of your own packages (it's unfortunately not quite that simple - real life is messy - but conceptually this is what we do).

Be careful about setting downstream pipelines to run automatically in your CI environment uncritically. E.g. for our "code that works on code" group of modules we have the following inter-package dependency graph:
image

You'll want a change to dkpkg to only run dktools tests once, not 12 times. If later tests depend on artifacts from earlier tests (e.g. wheels), you'll need to run the tests in a topologically sorted order (as opposed to depth-first, which is likely what a naive trigger would do).

To speed things up, the pydeps --externals step can be ran during the pipeline of a package if e.g. a line containing import is in the diff, so the collection steps don't have to wait for pydeps to run on dozens or more packages...

For ~dozens of packages this works pretty well and the graphs are relatively easy to reason about. When you get close to 100 local packages you'll need to find ways of segmenting the graph (and e.g. let the nightly builds handle inter-segment dependencies).

Hello everybody, thank you for your inputs, you have encouraged me that this isn't a crazy thing to try. This isn't so much an open issue for pydeps, so I will close the ticket, but if I manage to implement anything in this space (which I think I would certainly do as an open source pytest plugin) I will report back here, and hopefully others will be able to find this ticket in the future.