ofek/extensionlib

Question: Simple example

peekxc opened this issue ยท 21 comments

This is more of a question born out of my own ignorance on Python's build internals. I would like to use extensionlib to build native extension modules interfaced with pybind11. There is a simple example of how to do this with setuptools + setup.py (and other build tools). I'm trying to figure out how to do this with extensionlib + hatch but I can't figure it out.

  1. Do I use the custom build hook to compile native extensions, or do I need to make my own build-hook plugin?

Something like:

from hatchling.builders.hooks.plugin.interface import BuildHookInterface
class CustomHook(BuildHookInterface):
    def initialize(self, version, build_data):
        if self.target_name == 'wheel':
            ...
  1. ...or does extensionlib scaffold these build hooks via entry-points? If I have something like this:
# pyproject.toml
...
[project.entry-points.extensions]
hatch_build = "my-pkg:ExampleExtensionModules"

then this should instantiate a ExampleExtensionModules class in a hatch_build.py file, yes? And then build things with the build runner? Something like:

# hatch_build.py
from extension.runner import BuildRunner
from extension.interface import ExtensionModules

class ExampleExtensionModules(ExtensionModules):
    def __init__(self, name: str, root: str, metadata: dict, config: dict):
      super().__init__(name, root, metadata, config)
      runner = BuildRunner('.', ...)
      ....

    def inputs():
      return(["src/my_pkg/ext/"]) # this stores my __init__.py + extension module source files 

    ....

I can't seem to parse how to connect the build script with the pyproject.toml file / with hatch build; hatch build --ext silently returns without error and without building anything in the simple example I'm trying to get going now.

ofek commented

Hey! The idea is every extension builder like pybind11 will implement this and then Hatchling will add an always-on build hook that uses https://ofek.dev/extensionlib/runners/

I haven't done that yet because I'm waiting on @henryiii to do a PoC with I think scikit-build?

I haven't done that yet because I'm waiting on @henryiii to do a PoC with I think scikit-build?

Well although a PoC w/ scikit-build would be informative, I would hope the pybind11 extension runner would be independent of scikit-build. Since I use Meson instead of CMake for building C++, I have no interest in scikit-build.

The idea is every extension builder like pybind11 will implement this and then Hatchling will add an always-on build hook that uses https://ofek.dev/extensionlib/runners/

Ahh, ok this may be outside of my area of expertise. I may poke around at trying this myself but will be watching the pybind11 repo otherwise.

Pybind11 is a binding tool, not a build system. Scikit-build and mesonpy are build systems, and they would support extensions. I can't speak for mesonpy, but I'll be making sure pybind11 and scikit-build work well together (in fact, as just announced at SciPy, I'm funded to work on scikit-build over next three years). We've also discussed making a "simple" builder that would generate the configuration to build without hand-written CMake code - that's a bit down the road, though.

So as an outsider here whose interest is mainly "how does this fit in the packaging ecosystem?" I'm still confused over where this fits.

It sounds like @ofek is saying that this is a protocol that allows PEP 317 build backends (like hatchling), on being told that "file X is an extension managed by extension builder Y", to call Y in a standardised way to build1 the file X that the backend needs. Which is cool, but what is Y? It sounds like it's something like mesonpy or scikit-build - but I thought those two were PEP 517 backends? So is this for one backend to call another? Or is the expectation that we'll see a crop of a new sort of component, the "extension builder", which focuses on implementing a build process and exposing that functionality via extensionlib?

If enabling a new sort of "extension builder" component is the intention, then this sounds really cool and I look forward to seeing it bear fruit!

Footnotes

  1. This actually sounds general enough that why limit it to extensions? I could see other uses, such as a FFI tool that takes a description of a C interface and generates a Python file that wraps that interface using ctypes, with no extension involved. โ†ฉ

ofek commented

Yes that is the intention ๐Ÿ˜„

I could see other uses, such as a FFI tool that takes a description of a C interface and generates a Python file that wraps that interface using ctypes, with no extension involved.

Yes, that would be valid.

This actually sounds general enough that why limit it to extensions?

I think generalizing build hooks should be a separate endeavor.

mesonpy or scikit-build - but I thought those two were PEP 517 backends

Mesonpy predates this. Scikit-build is gaining a PEP 517 backend because writing another PEP 517 backend is a right of passage for anyone in packaging because the proposal predates this idea & it's a simple "easy start" point for users. Though depending on how this goes, maybe we can just rely on this. I'm not against the idea.

I'm giving a slide today at SciPy where I give an imaginary example:

[build-system]
requires = ["hatchling", "scikit-build-core", "mypyc"]
build-backend = "hatchling.build"

[project]
name = "example"
version = "0.1.0"

[[extensions]]
build-backend = "scikit_build_core.extension"
src = "src1/CMakeLists.txt"

[[extensions]]
build-backend = "scikit_build_core.extension"
src = "src2/CMakeLists.txt"

[[extensions]]
build-backend = "mypyc.extension"
src = "src/*.py"

I was imagining it would look something like that.

ofek commented

As currently implemented it would look like:

[build-system]
requires = ["hatchling", "scikit-build-core", "mypyc"]
build-backend = "hatchling.build"

[project]
name = "example"
version = "0.1.0"

[[project.extensions.scikit-build]]
src = "src1/CMakeLists.txt"

[[project.extensions.scikit-build]]
src = "src2/CMakeLists.txt"

[[project.extensions.mypyc]]
src = "src/*.py"

Extensions are not logically part of the project metadata. If they were under anything, they'd be under [[build-system.extensions]].

IMO, the extension build backend ideally should have the same structure as selecting the build backend; it's a familiar interface, and allows a package to provide more than one extension build backend. We'll also need to think about how we'd send different arbitrary configuration to each extension, which might bias is to a .<package-name>.

(Totally fine to have the above API for a draft to work on, though, just sharing my opinions).

ofek commented
  1. [[build-system.extensions]] is much better!
  2. A package can define an arbitrary number of extension builders https://ofek.dev/extensionlib/builders/#plugin-registration

PEP 518 disallows anything outside of [tool], so [[build-system.extensions]] (or [[project.extensions]]) would need to be standardised in a PEP.

Or even [[extensions]], yes. I was thinking the PoC (here) would use [[tool.extensionlib.extensions]] as the placeholder. Iโ€™m also hoping to see if we can propose a Protocol vs. requiring a library / ABC (though extensionlib would provide the Protocol and helpers).

PEP 518 disallows anything outside of [tool], so [[build-system.extensions]] (or [[project.extensions]]) would need to be standardised in a PEP.

You've stated your preference recently that any PEP should have an initial implementation and use before it's standardised, to avoid adopting PEPs that then are not used then. This seems to contradict then PEP-518, because I'd rather not need to migrate to a whole new namespace once it's approved ๐Ÿค”

Iโ€™m also hoping to see if we can propose a Protocol vs. requiring a library / ABC

If I'm understanding correctly, a standard would be essentially like PEP 517, defining a function that modules implementing the PEP should provide. I guess that's basically a protocol (albeit simply in words, rather than in the sense of typing.Protocol)?

If you are going to standardise this, though, what you propose in the PEP is up to you (and whoever you get as PEP sponsor). Feel free to ignore my comments if they aren't helpful.

This seems to contradict then PEP-518, because I'd rather not need to migrate to a whole new namespace once it's approved

Agreed, that's unfortunate, but necessary if you want to be pedantic.

I don't recall the statement you're referring to, so I'm reluctant to try to clarify "what I meant" because I fear I'll get called out if I'm inconsistent. But I will say that my main concern is getting something that we have a reasonable assurance that any PEP I accept isn't going to fall apart at the seams when people try to implement it. If someone can do that without a prototype implementation, then that's fine. And if someone else volunteers to be PEP-delegate, they can pick their own criteria.

But can we park the PEP process debates for now? I don't want to hijack this issue with off-topic discussions.

I guess my core confusion was I thought extensionlib (+hatchling) was at its core providing an alternative to distutils + setup.py--essentially doing what scikit-build-core does (or is aspiring to do in the future when disutils is removed as a dependency?), but @pfmoore cleared things up.

I could see other uses, such as a FFI tool that takes a description of a C interface and generates a Python file that wraps that interface using ctypes, with no extension involved.

This sounds to me like a fairly non-trivial task (perhaps already achieved to some degree by Cython?), but I could be wrong. Pybind11 provides this non-trivial FFI-functionality already for me in C++, which is really what I'm concerned with. I guess ctypes might be the Python standards-compliant way to go about it with C, but that's beyond the scope of my original question.

Well although a PoC w/ scikit-build would be informative, I would hope the pybind11 extension runner would be independent of scikit-build. Since I use Meson instead of CMake for building C++, I have no interest in scikit-build.

If you're already prepared to use Meson, then Meson implements its own classification for:

  • installing source files (pymod.install_sources(....))
  • building and installing compiled extensions (pymod.extension_module(...))

In the latter case, you'd be building it much the same way you'd build a C/C++ library -- although it's actually a subclass of C/C++ libraries that builds one with the .cpython-310-x86_64-linux-gnu.so style extension -- simply define the library name, provide source files and cpp_args and link_args, and any dependencies it needs to be linked to. pybind11 does not need anything coupled to an extension runner (?) for this, because it provides dependency lookup methods (pybind11-config, a cmake package, a Meson wrap) and it is really as simple as adding the headers path to the extension you want to build. So you can simply do dependency('pybind11') in Meson and carry on.

I'm not sure there's any great need in that case for a protocol to tell Meson how to either build extensions or add dependencies to them. mesonpy, the PEP 517 build backend, might choose to do so... I guess the idea would be to run one meson project per extension? The cmake examples seem to imply that will be the case. But currently, mesonpy has a satisfactory solution using one meson project per wheel, and leveraging Meson's (python-aware) output manifest.

...

As far as an example of using extensionlib goes, from my limited perusal it seems like a hook-based approach to code your own simple build system for performing tasks, including but not limited to C extensions. It's probably quite useful for a PEP 517 backend that doesn't have that built in, but it's not clear to me how advantageous that is for stepping out of a general-purpose build system such as Meson or CMake.

Am I correct in assuming that this is really just a replacement for hatch_build.py?

ofek commented

Am I correct in assuming that this is really just a replacement for hatch_build.py?

No.

The idea is for any PEP 517 backend to be able to do 2 things:

  1. trigger user-defined extension builds
  2. get the location of outputs

The current situation is that one must use setuptools or an extension builder that also knows how to write sdists and wheels.

But isn't that exactly what hatch_build.py is, except only supported by hatch? Sorry if I was unclear -- when I said that extensionlib looks like a replacement for hatch_build.py, I meant that extensionlib looks like a replacement for hatch_build.py "that other PEP 517 backends can use too".

The current situation is that one must use setuptools or an extension builder that also knows how to write sdists and wheels.

You also must either use setuptools or a pure python file gatherer that also knows how to write sdists and wheels. Does that mean this, too, should become a PEP-defined hook API?

More relevant to my interests, I guess... I'm curious what advantages there are to a hook-based extension builder serving as middleware between a PEP 517 backend and an independent general purpose build system such as Meson or CMake.

I can understand the motivation for a toolkit that helps one write simple but very specific one-off build steps (download this file, run that manpage conversion tool, msgfmt a series of localization catalogs from .po to .mo, things which are actually relatively common to do via setuptools by passing an overridden build cmdclass that cannot be done with build_py or build_ext and in general are really trivial to do without a full-fledged build system).

Would you advise Meson to create an extensionlib hook instead of building out a native PEP 517 backend as a successor to the third-party one people are currently using? And what would that look like?

I've been thinking about it in the context of scikit-build, and I think it might be best to provide both. I'd at least avoid deprecating a PEP 517 backend until we have experience with extensions. With the extensions backend, you get the ability to mix extensions in a single project (such as meson/cmake + mypyc + rust, for example). Larger, more complex projects sometimes need that - several projects I'm working with for https://iscinumpy.dev/post/scikit-build-proposal/ explicitly need this (and this was a benefit of everyone building off of setuptools in the past).

However, it's also a different structure - for the extensions, you have another tool (like hatchling) handling the Python files, and the cmake/meson build only has to produce the extension. While in the PEP 517 mode, cmake/meson could be responsible for copying over the Python files as well. Also PEP 517 mode might be simpler for users who just have a simple project that is primarily an extension - which is a huge fraction of projects. And this maps more directly for packages that support running cmake/meson directly & still produce a Python extension; those already copy Python files around.

So I don't think it's a successor, but more of a more general and powerful option. Ideally, I think the PEP 517 backend could be implemented via the extension (unless it's handling Python files itself, rather than delegating to meson, but even then I think it could share impl). Warning, though - I've been busy and traveling at SciPy and other things and could easily be oversimpling this by missing something. I also haven't looked at the current implementation much at all.

I'm funded to work on scikit-build & the ecosystem for the next three years, and I'd be happy to help out with the implementation of a hook like this in mesonpy when I work on it for scikit-build-core.


Edit: Ahh, "third party one" being mesonpy? Hmm, Meson gaining an extensions hook and mesonpy being a PEP 517 backend that (always) calls the official extension hook might be interesting.

Mypyc still only works as a tool that internally imports and uses a temporary setup.py, right? In that respect it's a bit like cython except that cython also has a tool to just output the C sources, and that's what meson uses for its builtin cython support. IMO that's probably the best long-term solution for mypyc -- don't treat it different from other extensions, handle it via meson/cmake.

Meson also has builtin rust support without depending on cargo, although that currently doesn't support crates (it's planned for the long term, projects like Mesa need this).

One of the benefits of a general purpose build system is that stuff like rust support will probably end up builtin anyway, so using it from there may turn out well.

It's still not clear to me whether extensionlib wants to treat each output as a separate "project" or whether a hook can output multiple files?

Am I correct that one of the conclusions one could take from these discussions is that we effectively need to wait for the community of PEP 517 backends to mature (e.g. meson-py) a bit more prior to seeing other possible hatch build-hooks spring up?

I'm trying to wrap my head around what a future hatch build-hook plugins could look like:

  • hatch + mypyc via hatch-mypyc
  • hatch + meson via... extensionlib + meson-py?
  • hatch + cmake via something similar
  • hatch + cythonize ...

I realize there are other things inside the python ecosystem that could be build hooks not necessarily tied to extension modules, but I thought I'd stay on topic.

Do these make sense, at least in theory?

ofek commented

effectively need to wait for the community of PEP 517 backends to mature

No. Anything that compiles or generates stuff with the intention of being placed in a wheel or sdist will use extensionlib/implement this new protocol. So I see meson builds wheels/sdists. In this hypothetical future it would no longer do that and only compile extensions which then any PEP 517 backend can place in the wheel or sdist.

Essentially, the goal is to separate the task of compilation from building wheels/sdists. So we're waiting on things that compile (some of which happen to be PEP 517 backends) to try out extensionlib.

future hatch build-hook plugins

My plan is for Hatchling to add a default enabled build hook (whenever someone does an extensionlib PoC cc @henryiii) that runs these configured hooks based on whatever the PEP ends up saying.