Document known flaws of lazy loading external modules
agucova opened this issue · 8 comments
As I described here, lazy loading external modules can break modules silently, even to other libraries, in weird or unexpected ways. I imagine you're already aware of this.
Given that there doesn't appear to be any reliable solution to this, I think it would be useful to at least document this flaw, so users can at least notice when something like this happens.
To reproduce
import lazy_loader as lazy
plt = lazy.load('matplotlib.pyplot')
plt
# This code can be in an arbitrary module
import matplotlib
import matplotlib.pyplot
matplotlib.pyplotGives:
AttributeError: module 'matplotlib' has no attribute 'pyplot'
This is probably due to matplotlib doing some fancy footwork on import.
What is happening at the moment is that the submodule lookup does not resolve, so that it gets passed along to matplotlib's __getattr__ machinery, which doesn't know what to do with it. It could be related to the same issue identified in PEP 690:
relying on imported submodules being set as attributes in the parent module
Here's a general reproducer (but it only fails on matplotlib):
import lazy_loader as lazy
for modname in ('numpy.fft', 'scipy.linalg', 'matplotlib.pyplot'):
print()
print('-'*50)
print(f'Lazy importing {modname}')
print('-'*50)
rootmod, submod = modname.split('.')
mod = lazy.load(modname)
print(f'Module type: {type(mod)}')
imported_rootmod = __import__(rootmod)
imported_mod = __import__(modname)
print(f'getattr {submod} on {rootmod}:', end='', flush=True)
print(getattr(imported_rootmod, submod))Its output looks as follows:
--------------------------------------------------
Lazy importing numpy.fft
--------------------------------------------------
Module type: <class 'importlib.util._LazyModule'>
getattr fft on numpy:<module 'numpy.fft' from '/home/stefan/envs/py311/lib64/python3.11/site-packages/numpy/fft/__init__.py'>
--------------------------------------------------
Lazy importing scipy.linalg
--------------------------------------------------
Module type: <class 'importlib.util._LazyModule'>
getattr linalg on scipy:<module 'scipy.linalg' from '/home/stefan/envs/py311/lib64/python3.11/site-packages/scipy/linalg/__init__.py'>
--------------------------------------------------
Lazy importing matplotlib.pyplot
--------------------------------------------------
Module type: <class 'importlib.util._LazyModule'>
getattr pyplot on matplotlib:Traceback (most recent call last):
File "/tmp/lazy.py", line 15, in <module>
print(getattr(imported_rootmod, submod))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/envs/py311/lib64/python3.11/site-packages/matplotlib/_api/__init__.py", line 219, in __getattr__
raise AttributeError(
AttributeError: module 'matplotlib' has no attribute 'pyplot'
I'll see if there's anything we can do to rectify this problem "after the fact".
@dschult I see you looked at this before: networkx/networkx#5838 (comment)
Did you ever figure out whether we could make any fixes on the LazyLoader side, or whether this is an inherent limitation of importlib's lazy loader?
We could update the documentation to warn library authors against lazy loading outside of their own package, but it would be much better if we could avoid it.
Since we have full control over what load returns, I thought we might be able to replace the _LazyLoader class with a variant that addresses this. Initial experiments are not looking promising :) One thing I want to figure out is the mechanism by which import matplotlib.pyplot sets the pyplot attribute on matplotlib.
BTW, I've noticed that when I do not add submodules to sys.modules, the problem goes away.
I.e.:
diff --git a/lazy_loader/__init__.py b/lazy_loader/__init__.py
index ed6f7dd..eb2c77f 100644
--- a/lazy_loader/__init__.py
+++ b/lazy_loader/__init__.py
@@ -184,7 +184,8 @@ def load(fullname, error_on_import=False):
del parent
module = importlib.util.module_from_spec(spec)
- sys.modules[fullname] = module
+ if not "." in fullname:
+ sys.modules[fullname] = module
loader = importlib.util.LazyLoader(spec.loader)
loader.exec_module(module)@dschult perhaps you can help me think through why that is/is not a good idea.
Unfortunately, that triggers a greedy import of the parent module:
import sys
import lazy_loader as lazy
plt = lazy.load('matplotlib.pyplot')
print('Lazy loaded module type:', type(plt))
print('Parent module type in sys.modules?:', type(sys.modules['matplotlib']))At this point, we may want to consider dropping lazy.load, or at least document its behavior very carefully.
If more libraries use the lazy attachment functions we provide instead, then importing becomes cheap(er), and load will no longer be as necessary.
The issue with matplotlib is likely to be related to their overwriting of __getattr__. At least that's the part of the code that raises: AttributeError: module 'matplotlib' has no attribute 'pyplot'
I can imagine that the caching nature of their __getattr__ along with a delay in importing causes a lookup to happen before the library is actually loaded. Maybe it even messes up the lazy_load __getattr__ magic. But I don't have a setup for exploring that fully yet.
I agree that in the long run as more libraries use the lazy attachment functions and importing becomes cheaper, load will no longer be as necessary. And the amount of possible pain for cases like this one is likely to exceed the benefit of the load feature. So, I'm +1 on dropping lazy.load.
Clearly, people are already using it though, so I'll try to find a way to make it work with matplotlib. Do we know of any other libraries for which it isn't working well?
Clearly, people are already using it though, so I'll try to find a way to make it work with matplotlib. Do we know of any other libraries for which it isn't working well?
I'm not sure if the root cause is the same, but transformers fails as well:
import lazy_loader as lazy
transformers = lazy.load("transformers")
print(transformers)This gives:
Traceback (most recent call last):
File "test.py", line 4, in <module>
print(transformers)
File "<frozen importlib._bootstrap>", line 296, in _module_repr
File "<frozen importlib.util>", line 252, in __getattribute__
ValueError: module object for 'transformers' substituted in sys.modules during a lazy load@bra-fsn I suspect your issue was due to lazy.load being used in parallel, but not sure. Please open a new issue if you can reproduce.
@agucova We've documented the known deficiency that submodules trigger immediate import.
Matplotlib does a bunch of weird magic during their import, so perhaps best to steer clear of lazy loading that library.