hpyproject/hpy

Problem with adding a Python file next to the universal extension to make it importable

Opened this issue · 13 comments

Some Python compilers (for example Cython in pure Python mode and Pythran) use the priority to an extension when importing a module. When importing foo, if foo.py and foo.so exist, foo.so is imported.

With some Python compiler, the developer code in Python in a file foo.py and call a Python compiler to generate an extension foo.so. Under the hood, a .c or .cpp is created and compiled. In Python, importing foo uses the extension because Python prioritizes the extension.

This mechanism seems to be incompatible with the way HPy works by now (adding a Python file with the same name as the extension). Therefore, it would be better to find another way to make universal extensions importable. In particular, the standard behavior to prioritize an universal extension over a Python file should be obtained.

I realize that this is not a high priority problem for HPy! But I think it is worth mentioning the issue.

Do you have any suggestions for how to do this? The difficulties are as follows:

  • HPy universal modules should be universal (i.e. the same .hpy.so file should loadable by different Python implementations, and in particular the arguments and return value of HPy_MODINIT should be ABI compatible)
  • HPy universal modules should be importable without having to first load some custom import hooks (e.g. import hpy; hpy.install_hy_universal_importer()

It would definitely be great not to have the .py stub files for universal modules.

The current plan is to ask CPython people if they can recommend or add a way to do this.

Is CPython prioritising extension module over .py files a guaranteed behaviour, or an implementation detail that gets exploited and relied on? I get this can be useful, but maybe it could still be not a good thing to do?

One method would be to use a *.pth file, which site executes import statements inside. Then it’ll automatically be loaded.

Is CPython prioritising extension module over .py files a guaranteed behaviour, or an implementation detail that gets exploited and relied on? I get this can be useful, but maybe it could still be not a good thing to do?

It's the order in which the FileLoader classes are defined in _get_supported_file_loaders

This seems to not have changed in a long time, even Built-in Package Support in Python 1.5 mentions it.

One method would be to use a *.pth file, which site executes import statements inside. Then it’ll automatically be loaded.

A good idea, although there is talk of deprecating import ... statements in .pth files. Putting something in site.py or sitecustomize.py could be another option. Perhaps each hpy.universal implementation could install such a .pth file and then built HPy Universal ABI modules could be loaded directly without needing a stub .py file.

It's the order in which the FileLoader classes are defined in _get_supported_file_loaders

This seems to not have changed in a long time, even Built-in Package Support in Python 1.5 mentions it.

Hmm, interesting. So the 1.5 documentation contains this line:

Tip: the search order is determined by the list of suffixes returned by the function imp.get_suffixes().

imp is deprecated, and the current documentation says:

Deprecated since version 3.3: Use the constants defined on importlib.machinery instead.

But importlib.machinery does not have an equivalent constant; the only thing close to it is all_suffixes(). And its value does not match the implemented import logic:

So it seems like there is no inherited logical ordering to the extensions, and the fact that all_suffixes() went in without anyone wanting it ti match the actual import logic suggests to me that it’s not the core devs don’t feel the order extensions are tried is a specified thing?

Is CPython prioritising extension module over .py files a guaranteed behaviour, or an implementation detail that gets exploited and relied on? I get this can be useful, but maybe it could still be not a good thing to do?

@uranusjr, I don't understand your point. Do you mean that HPy doesn't need to care about this because it is not well documented in importlib?

Prioritizing an extension over a .py file is a consistent behavior since Python 1.5. This is a very simple and reasonable behavior. If someone adds a extension next to a .py file, it's because it's better to use the extension. This priority is used by tools like Cython and Pythran, and I guess internally by other packages. There is no reason to change that and I don't see why it would change in future Python versions. So I guess if HPy can't get this behavior, one can write a PEP to propose this breaking change and propose a nice alternative :-)

A strong point of HPy is that the whole transition seems doable because for most packages without hand written C code (for example scikit-learn or scikit-image), switching to HPy will imply mostly changing few lines in setup.py / pyproject.toml. If no nice solution is found for this issue, it won't be like that at all.

I wonder if it could be technically possible to provide a function hpy.make_universal_extensions_importable that can be called early in the init process of the packages so that universal extensions would be importable without stub .py files ?

Yeah, I’m more or less trying to imply that a) this probably needs a clarification from CPython core devs, and b) if the behaviour is not intended to be relied on, this could be considered out of scope of HPy (clarification: this is most definitely only a personal opinion without any relation to HPy developers).

There are many ways to achieve the “extension over pure Python” goal without relying on file extension ordering, e.g. have an optional _mylib_speedup and try-catch from _mylib_speedup import *. This would definitely means additional work the transition, but adopting HPy already requires modifying your build scripts, so I’d argue that may not be a bad thing if this is not something to be relied on.

Note that if CPython says the behaviour is promised to be reliable, none of the above applies, and HPy should definitely support the use case (and CPython should probably fix importlib.machinery.all_suffixes()). My main point is someone should get that clarification first before trying to solve the problem.

One method would be to use a *.pth file, which site executes import statements inside. Then it’ll automatically be loaded.

this is an interesting idea. The biggest pro is that by having a proper import hook, we can enable complex behavior (e.g. loading specific modules in debug mode depending on the value of an env variable and/or a config file).

The biggest cons are:

  1. executing arbitrary code in pth files is considered "a hack", see e.g. PEP 648, Python issue 24534 and Python issue 33944.
  2. the import hook will be activated only "at some point" during the startup process. If for any reason some code needs to import an hpy module before that point, it will fail

The cleanest solution would probably be a custom FileLoader class, right? How (un)likely is it to get that kind of support from the CPython developers?

I finally asked a question related to this issue here.

thank you for asking this @paugier! I look forward to see what is the official answer. I suppose we can safely assume that the current behavior will never change, but it's better to have some official clarification.

As for this issue, I think that the only reasonable solution is to write a custom importer as @cklein suggests, and install it using the "pth hack", until we find a better solution and/or CPython provides a cleaner way to do it.

As usual, if anyone feels like working on that, contributions are welcome :)

For the editable installs Meson-python does something which could be used for HPy:

This call adds to sys.meta_path a meta path finder which changes how extensions are imported.

HPy could also provide a Python API to ensure that its own MetaPathFinder is added.