Clarification about how to implement namespace packages (as in PEP 420) via import hooks for PEP 660 use case
abravalheri opened this issue ยท 21 comments
This is a follow up of 2 previous discussions in the Python discourse and an issue in pypa/packaging-problems:
- https://discuss.python.org/t/pep-660-and-setuptools/14855/15
- https://discuss.python.org/t/how-to-implement-namespace-packages-as-in-pep-420-via-import-hooks/15192
- pypa/packaging-problems#588
Summary
One of the popular ways of implementing PEP 660 is via import hooks (since other alternatives like symlinking package folders are not available in all platforms).
However the reference implementation editables currently does not support namespaces.
Specifically the dynamic path computation aspect of PEP 420 is a challenge.
After inspecting the implementation for importlib._bootstrap_external, I may have managed to find 2 solutions for this problem, described in the detail in this discourse post and demonstrated as a PoC in this gist:
- Solution A: Create a custom "path class" that tries to emulate the
_NamespacePathbehaviour. - Solution B: Create a custom
PathEntryFinderand add a โbogusโ entry tosys.pathjust to trigger it (implicitly relying thatimportlib.machinery.PathFinderwill take care of setting__path__to_NamespacePathobject under the hood).
However, the problem is that I don't know if these approaches are reliable in the long run and in accordance with the language specification.
Open Questions
Questions like the following are still not clear to me:
- Is there a chance that an internal check for namespace packages will fail if
__path__uses a custom class instead of_NamespacePatheven if the behaviour described in PEP 420 is observed? - Can we rely on the fact that the import machinery will automatically set the namespace loader for specs in the form of
ModuleSpec("name", None, is_package=True)? - Is
pkgutil(and particularlypkgutil.extend_path) still considered first class citizen of the standard library or is it better to avoid it when writing new code? - Can we rely on the behaviour of
sys.path_hooks+importlib.machinery.PathFinderto createModuleSpecobjects for namespaces that will automatically perform dynamic path computation? Or is this an internal implementation detail that can change in the future?
If the discussed approaches are not recommended, how can we support namespace packages using import hooks and deliver a complete solution for PEP 660?
I'm going to preface all of this with @ericvsmith is the namespace package expert. ๐
- Is there a chance that an internal check for namespace packages will fail if
__path__uses a custom class instead of_NamespacePatheven if the behaviour described in PEP 420 is observed?
I don't think so. Easiest way to check is to search for _NamespacePath and make sure there aren't any isinstance() checks. Otherwise experiment and let us know. ๐
2. Can we rely on the fact that the import machinery will automatically set the namespace loader for specs in the form of
ModuleSpec("name", None, is_package=True)?
As in that exact call signature for ModuleSpec? I wouldn't count on that being like that forever as we don't make any promises around that.
3. Is
pkgutil(and particularlypkgutil.extend_path) still considered first class citizen of the standard library or is it better to avoid it when writing new code?
pkgutil is on its way out as soon as people have the time to deprecate it.
4. Can we rely on the behaviour of
sys.path_hooks+importlib.machinery.PathFinderto createModuleSpecobjects for namespaces that will automatically perform dynamic path computation?
The language spec just says it "works" without giving details in how.
I think there are two possible ways to solve this (and we probably want both long-term). One, you register a custom finder that effectively re-implements namespace packages the way you need it to work. Two, we develop whatever APIs, etc. you need to make this as pluggable as possible.
Thank you very much for the clarifications @brettcannon.
I agree with your conclusions, if there were APIs officially supported for implementing the namespace packages using a custom finders, I would be more than happy to use them. The main problem that I am facing right now is the impression that things work "accidentally".
I think it is also worth a shot considering the importance of editable installs in the community.
I think it is also worth a shot considering the importance of editable installs in the community.
Yep, this is just pushing up against the niche nature of namespace packages.
All of the hard-coded locations of using namespace-related classes seem to be:
cpython/Lib/importlib/_bootstrap_external.py
Lines 1496 to 1499 in a91ffcf
cpython/Lib/importlib/_bootstrap_external.py
Lines 1514 to 1519 in a91ffcf
cpython/Lib/importlib/_bootstrap.py
Lines 505 to 512 in f4c0348
The first two are all in PathFinder, so that could grow a keyword argument or something in its initializer to allow for overriding the defaults.
But the latter is deep in the import system, and so that would require something on sys.
This is also assuming I didn't miss a spot. ๐
Would this entry also be relevant?
cpython/Lib/importlib/_bootstrap_external.py
Lines 1409 to 1411 in 605e9c6
It also seems that in 3.11 NamespaceLoader is promoted to a "public API", but just for the sake of introspection (the example given in the docs shows an isinstance check). The documentation implies that direct instantiation is not allowed.
It would seem that the existing behaviour is highly dependent on _NamespacePath and NamespaceLoader, but there is no public API to construct the objects.
Would this entry also be relevant?
Yep.
It would seem that the existing behaviour is highly dependent on
_NamespacePathandNamespaceLoader, but there is no public API to construct the objects.
Correct. We purposefully didn't over-expose an API we would have to support for a decade or more when no one had really asked for access previously.
Please let me know if there is anything I can do to help with this.
As I previously mentioned, approach B seems to work right now across the major active versions of CPython (tested in setuptools), and it does not involve using soon-to-be-deprecated functionality as pkgutil or re-implementing big parts of the import machinery. So maybe the easiest approach here would be to add tests to capture this expectation and prevent accidental regressions.
Brainstoming of alternatives/complementary approaches
-
By looking at the
_bootstrap_externalcode (and considering the fact that the newest API already exposesNamespaceLoaderfor users to compare viaisinstance), I imagine that exposing bothNamespaceLoaderandNamespacePathto allow users to directly instantiate them would do the trick. -
Maybe adding a public constructor for namespace specs (e.g.
NamespaceSpecorimportlib.util.namespace_spec_from_portion) could also do the trick. -
_NamespacePathwould very likely requireappendoperations to not be undone in the_recalculatemethod...
Please let me know if there is anything I can do to help with this.
More free time or get @warsaw , @ericvsmith , or @ericsnowcurrently to weigh in.
Specifically the dynamic path computation aspect of PEP 420 is a challenge.
TBH, I'd totally forgotten about that language in the PEP. @ericvsmith, I don't remember why this was added. That said, while I can understand that there might be some usefulness to this in deployed code, I'm not sure how useful it is in the same environments where you'd want editable installs (during the development process, right?). I personally can't remember any time where I or a package I've used relied on this behavior.
Is there a chance that an internal check for namespace packages will fail if path uses a custom class instead of _NamespacePath even if the behaviour described in PEP 420 is observed?
The one place folks seem to have missed is in importlib.resources.reader. There's a shallow dependency on the str() of the argument to NamespaceReader, but that would be easily faked (i.e. by calling your class _EditableNamespacePath). @jaraco for thoughts on the implications or ways to make this more generic.
It also seems that in 3.11 NamespaceLoader is promoted to a "public API", but just for the sake of introspection (the example given in the docs shows an isinstance check). The documentation implies that direct instantiation is not allowed.
I've thought about exposing _NamespacePath as a public name for introspection purposes, but it didn't seem as urgent as NamespaceLoader.
Thank you very much for the comments @warsaw.
I'm not sure how useful it is in the same environments where you'd want editable installs (during the development process, right?). I personally can't remember any time where I or a package I've used relied on this behavior.
As long as developers are allowed to create and publish namespace packages, editable installs will be relevant during the development process of those packages. I don't know how many namespace packages are published in PyPI, but I would not be surprised if their authors/maintainers use an editable install when implementing changes/bugfixes/features etc.
As an anecdote (that might not be representative of the general tendencies), when I asked about feedback from the community about setuptools candidate implementation of PEP 660, one of the few responses I got back was about a package using namespaces1.
Footnotes
-
This project in particular is using
pkgutilnamespaces to be precise, not PEP 420. But I think that is fair to imagine that eventually all namespace packages will end up migrating to PEP 420. Another example are the popularsphinxcontrib-*packages which usepkg_resourcesstyle namespaces. โฉ
As long as developers are allowed to create and publish namespace packages, editable installs will be relevant during the development process of those packages. I don't know how many namespace packages are published in PyPI, but I would not be surprised if their authors/maintainers use an editable install when implementing changes/bugfixes/features etc.
I should have been more clear in my comment. In fact, I'm an author of several packages in the flufl namespace (albeit only flufl.i18n and flufl.lock are still actively maintained), and yes I still want editable installs to work with them. What I was specifically wondering the utility of was whether dynamic path calculation was actually useful while developing those namespace packages, i.e. when editable installs are useful. In deployed code, I can't think of why I'd want editable installs.
The dynamic path calculation is the only aspect of namespace packages that is problematic (as far as I can tell, it's been a while since I looked at the code) and to be perfectly honest I'm not completely sure what the use cases are for that feature. In general, we don't try to guarantee what will happen in a running Python process if new packages are installed "in the background", so most of the cases I've been able to come up with for the dynamic path feature feel like things I wouldn't recommend doing anyway...
Can anyone explain what the intended use of the dynamic path feature is? If there's no use cases that we actually want to support, would it be easier to simply deprecate the feature?
Can anyone explain what the intended use of the dynamic path feature is?
I can't. As I said, I didn't even remember that being part of the PEP! ๐
If there's no use cases that we actually want to support, would it be easier to simply deprecate the feature?
WFM. Or maybe another way to frame the question is, has anybody been asking for this feature?
IIRC, Nick and PJE wanted the feature.
/me checks โฆ
https://peps.python.org/pep-0420/#id8 mentions it, and links to where it was discussed. PJE talks about it here: https://mail.python.org/pipermail/python-dev/2012-May/119603.html
I think this was included so we could interoperate with legacy namespace packages in setuptools. That use case is probably long gone.
Cool. So what would be involved in deprecating that behaviour? Would it need a PEP? For the purposes of editable installs we could just not support it, citing this discussion and any subsequent deprecation as the reason, but it would be good to formally deprecate and then remove the feature, to avoid confusion.
For the purposes of editable installs we could just not support it, citing this discussion and any subsequent deprecation as the reason
Thank you very much, I will rework the exiting implementation taking this discussion into consideration.
Should I also be worried about using the constructor of ModuleSpec directly and accessing the submodule_search_locations attribute? Both items seem to be part of the documented public API on docs.python.org, and I think they would required to implement a MetaPathFinder.find_spec that is able to handle namespace packages (even with the dynamic path computation requirement dropped) ... (Or do you have a different suggestion?)
Cool. So what would be involved in deprecating that behaviour? Would it need a PEP? For the purposes of editable installs we could just not support it, citing this discussion and any subsequent deprecation as the reason, but it would be good to formally deprecate and then remove the feature, to avoid confusion.
The problem is that we know why the feature was originally added, but we don't know who is relying on it for other reasons. I think this would need a wide discussion, and probably a PEP.
On reflection, I'm not sure deprecation is the right thing to do here. Having re-read the PEP, the encodings example is convincing - people do modify sys.path, and that is supported, and it is reasonable to expect that if someone does import foo.xxx, then modifies sys.path to add a directory that already contains foo/yyy, then import foo.yyy would work. That's very different from modifying the filesystem while the application is running (which is the thing that we discourage because it has non-obvious effects).
So I think we probably should retain the current behaviour, and ideally make it easier for custom importers to hook into the namespace path mechanism.
So I think we probably should retain the current behaviour,
+1
and ideally make it easier for custom importers to hook into the namespace path mechanism.
+1
This is where my mind went pretty quickly. ๐ Clearly there's a meaningful case for hooking into/duplicating the namespace path logic. It seems like helpers in importlib.util and/or importlib.machinery (exposing or derived from the existing _bootstrap*.py code) would meet the need here.
FWIW, every time I look in _bootstrap*.py and bump into the special-casing for namespace packages, I wonder if there might be a less special-case-y way to get there. At the least, having no public API to replicate the functionality has always seemed off to me. (That said, there is value, of course, in not providing public API until actually needed. ๐)
That's very different from modifying the filesystem while the application is running (which is the thing that we discourage because it has non-obvious effects).
Agreed.
I'm still wondering, short of deprecating this feature, are there users who are asking for support for this with editable installs?
I'm not aware of any. My library editables doesn't support it when using a custom importer. I don't know how many people are using the library in that mode, but I've not had any complaints. As setuptols has a far larger user base, @abravalheri may have more useful feedback, though.
As setuptols has a far larger user base, @abravalheri may have more useful feedback, though.
I just recently invited the community to start experimenting with the new editables support, and the current implementation does try its best to be compatible with the dynamic path computation (I am using the approach B discussed above). So I also don't have any input about that.
I could try to modify the implementation to not support dynamic path computation and see what would be the reaction of the user base, if that is the core team recommendation. (I imagine if someone were to complain, we are just going to find out after the feature is included in a stable release for setuptools).