Check which attributes have been loaded without triggering imports

Question

Check which attributes have been loaded without triggering imports

lagru opened this issue 3 years ago · 9 comments

Is there a mechanism to check which objects have been loaded and which ones are still in the "lazy" state? Looking through the code and inspecting the objects returned by lazy.attach_stub it didn't see an obvious way to do so.

Maybe this could be addressed by making attach not return simple functions but objects. E.g.

__getattr__, __dir__, _ = lazy.attach_stub(__name__, __file__)
__getattr__.loaded_names  # return names which were already loaded

I think this would be very helpful in debugging and testing that lazy loading actually works as intended.

I'd be happy to work on this if there is interest!

Answer 1 · 2023-04-13T14:30:19.000Z

Inspecting sys.modules might help with some cases. Though, it is less explicit than checking an object which remembers which attributes were accessed.

Answer 2 · 2023-04-13T15:59:59.000Z

You could also just attach that dict to the function itself.

In [3]: def __get__(x):
   ...:     return x
   ...:

In [4]: __get__.__loaded = ['foo']

In [5]: __get__.__loaded
Out[5]: ['foo']

Question is, who will be looking at that. Perhap it can be done only when a debug flag is present in the environment.

Answer 3 · 2023-04-13T17:57:57.000Z

Hmm, attach creates a new __getattr__() as a closure for each call, so I guess this would work. Though, it feels very hacky to me. We do have classes to combine logic and state. A simple class with __call__ and __repr__ seems like the saner and more flexible architectural choice to me. 😅

Good point about the debug flag. I guess the bigger question is how to reliably and automatically test that accidental imports don't invalidate lazy loading. Basically I'd like something to turn red and notice if a PR triggers an import in scikit-image.

Answer 4 · 2023-04-13T18:21:01.000Z

If that's all you want to do, then you can just add a check to getattr that raises if a certain env variable is set, or if you're inside a certain context manager.

Answer 5 · 2024-01-25T20:26:49.000Z

I'm closing for now, since there's no obvious action to take.

Answer 6 · 2024-01-26T11:35:18.000Z

Could we re-open since there's your suggestion

you can just add a check to getattr that raises if a certain env variable is set, or if you're inside a certain context manager.

which I'd be happy to take on some time?

Answer 7 · 2024-03-15T23:06:32.000Z

@lagru I'm getting back to this issue; could you help me understand what kind of debugging you need to do? Is it sufficient to raise on getattr? That will only catch the first instance. Would logging be better?

Answer 8 · 2024-03-16T13:03:47.000Z

Basically, I'd like to be able to test the assumption that nothing is loaded for a given import. I like your environment variable idea. What do you think about an API like this?

LAZY_LOADER_RAISE_ON=".*" python -c "import skimage"
LAZY_LOADER_RAISE_ON="skimage.restoration" python -c "import skimage.restoration"

LAZY_LOADER_RAISE_ON would tell lazy_loader to raise an error if it is requested to load something whose __qualname__ matches the regex. This approach seems reasonably simple to implement. If necessary this we could even add LAZY_LOADER_ALLOW for a very flexible blacklist / whitelist approach.

Ideally, I'd like to have an approach that could be called from within Python, but that would require a clean import slate for the current Python process. Could be done with subprocess but then it's no better than the approach above using env variables. What do you think?

Answer 9 · 2024-03-16T13:05:19.000Z

Being able to check in the current console has "X" been imported would be a bonus that might help with debugging.