scientific-python/lazy-loader

Optional package features

effigies opened this issue · 6 comments

I want to run a couple ideas past this group, which is mostly targeted at deciding whether it makes more sense to help implement features here versus update our similar feature to use lazy loading under the hood. The use case in mind is optional (extra) dependencies, and see the next (collapsed) section for details.

The features I'm proposing are are:

  1. Version constraints (I'm thinking packaging.requirements arbitrary strings, but minimum versions are probably >90% of the use case)
  2. Boolean "found package" return value

Here is a quick API proposal to demonstrate:

import lazy_loader as lazy
h5py, have_h5py = lazy.load_requirement('h5py>=3.0')

Here h5py would be a lazy module, which will fail if either h5py is not installed or its version does not satisfy >=3.0. have_h5py will be a boolean that can be used to determine whether accessing attributes of h5py will fail without attempting the load.

I think this would in practice require adding importlib_metadata (for Python <3.9) and packaging to the dependencies, so it might be out of scope if you're trying to keep this absolutely minimal.


Click for some details about what I would use this API to replace

We've been doing something similar to lazy.load in nibabel.optpkg.optional_package for a while, that looks like this:

from nibabel.optpkg import optional_package
h5py, have_h5py, setup_module = optional_package('h5py', min_version='3.0')

h5py is either the fully loaded h5py module or a "tripwire" object that will fail when you attempt to access an attribute. So it is (currently) only lazy in the sense that an ImportError does not occur immediately (and will be replaced by a more appropriate informative error message). But from a namespace standpoint, lazy.load('h5py') and optional_package('h5py')[0] are interchangeable.

Importantly for the optional dependency use case, an installed version that is lower than the required version is treated as a failure to import. This can come up periodically, and a dependency mismatch is only relevant to the user or downstream tool if they are accessing h5py functionality in nibabel which might require a higher version. (The error message, of course, explains the issue.)

The other features are have_<module>, which is a simple bool. I personally mainly use it for tests like @unittest.skipUnless(have_h5py, reason="Needs h5py"), but you can imagine other cases when it's more convenient than accessing the module and allowing/catching an exception.

Finally, setup_module is just a convenience that will raise a SkipTest() exception so a module that can't be tested at all without the dependency will be skipped by tools like nose or pytest. (Often we will use `h5py, have_h5py, _ = optional_package('h5py'), if we don't want to skip the entire module.) This is almost entirely unused even internally, so I'm not proposing to add it here.

I'm wondering if this is the right place to check for dependencies. These requirements are usually handled by pip, conda, etc. Do you know of packages that check, at runtime, whether their installation requirements are met? My gut feel is that it is out of scope for the lazy loader, but then I also haven't thought about it much.

Do you know of package that check, at runtime, whether their installation requirements are met?

Nilearn is the main one I know of that is so comprehensive: __init__.py and version.py

In practice, I would say most tools I've seen import install_requires dependencies at the tops of modules. So import nibabel will fail if you somehow got it in an environment without numpy. And then extras_require dependencies would either be guarded like

try:
    import h5py
    have_h5py = True
except ImportError:
    have_h5py = False

Or imported in function scopes:

def myfunc(...):
    import h5py  # Raises ImportError

optional_pkg is nibabel's compromise that lets you put h5py = optional_package('h5py') at the module head but defer the exception until you try to access it, which it turns out lazy.load() also does.

Just as a proof-of-concept, I could rewrite it:

def optional_package(pkg):
    mod = lazy.load(pkg)
    have_mod = pkgutil.find_loader(pkg) is not None
    return mod, have_mod

Setting a minimum version would add some complication, as we would need to return something besides lazy.load() if the module exists but is too old.

If this would be useful to nipy, I think we can consider adding the feature, but only if it does not add much complexity.

If there's no opposition, I can try to put together together a PR in the next month or so and see how intrusive it is.

Thank you, Chris. And thank you for your interest in the project!