bazelbuild/rules_python

py_binary with hermetic toolchain requires a system interpreter

jpgxs opened this issue ยท 29 comments

jpgxs commented

๐Ÿž bug report

Affected Rule

py_binary

Description

Running py_binary without a system interpreter (using a toolchain configured with python_register_toolchain)
fails with the following error:

/usr/bin/env: 'python3': No such file or directory

After installing a system python3, the rule runs fine and uses the correct Python interpreter (not the system one).

๐Ÿ”ฌ Minimal Reproduction

Files

# WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "rules_python",
    sha256 = "cdf6b84084aad8f10bf20b46b77cb48d83c319ebe6458a18e9d2cebf57807cdd",
    strip_prefix = "rules_python-0.8.1",
    url = "https://github.com/bazelbuild/rules_python/archive/refs/tags/0.8.1.tar.gz",
)
load("@rules_python//python:repositories.bzl", "python_register_toolchains")
python_register_toolchains(
    name = "python3_10",
    python_version = "3.10",
)
# BUILD
exports_files(["interp_version.py"])
py_binary(
    name = "interp_version",
    srcs = ["interp_version.py"],
    main = "interp_version.py",
)
# interp_version.py
import sys
print(sys.version_info)

Repro

Using the ubuntu:focal Docker image:

root # bazel run //:interp_version
INFO: Analyzed target //:interp_version (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:interp_version up-to-date:
  bazel-bin/interp_version
INFO: Elapsed time: 0.773s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
/usr/bin/env: 'python3': No such file or directory  <-- Error

root # apt-get install python3-minimal
[...]

root # python3 --version
Python 3.8.10


root # bazel run //:interp_version
INFO: Analyzed target //:interp_version (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:interp_version up-to-date:
  bazel-bin/interp_version
INFO: Elapsed time: 0.707s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
sys.version_info(major=3, minor=10, micro=2, releaselevel='final', serial=0)  <-- 3.10.2

๐ŸŒ Your Environment

Operating System:

Ubuntu Focal (20.04.4 LTS)

Output of bazel version:
Bazelisk version: v1.11.0
Build label: 5.1.1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Apr 8 15:49:48 2022 (1649432988)
Build timestamp: 1649432988
Build timestamp as int: 1649432988

Rules_python version:

0.80.1

Hmm, this is tricky, not really sure if it's "bug" per se in rules_python. The launcher generated by bazel itself has a reference to this shebang line.
We should at minimal allow setting stub_shebang on py_runtime. It may be tricky to get the path to the toolchain python in there, not sure.

jpgxs commented

Thanks for the rapid response ๐ŸŽ. It's certainly not a critical issue for me as the workaround is trivial.

Do you think it would be useful to add a note in the README to save future hackers time debugging?

I'm facing the same issue where using the rules_docker to create a docker image with my python code based on ubuntu:latest fails with:

ERROR: for example  Cannot start service example: OCI runtime create failed:
container_linux.go:349: starting container process caused
"exec: \"/usr/bin/python\": stat /usr/bin/python: no such file or directory": unknown

Do you have any solution for that?

I think this is a duplicate of bazelbuild/bazel#8446?

Agree that the README could make clearer the stub script python dependency. @mattem has Aspect successfully used bazelbuild/bazel@763dd0c anywhere?

It appears to have landed in 5.x.x, and allows for wiring up the toolchain's interpreter as the stub script's interpreter.

I have also successfully used the stub shebang customization. Its a viable solution here.

python_register_toolchains sets up the py_runtime on behalf of the user, so it would have to be responsible for detecting that the Bazel version is high enough and then adding the stub_shebang = ... data.

In the recent past, we had a rough time trying to accurately detect the user's Bazel version: #522

In the meantime I've thrown up #698.

Just commenting in an in-the-wild example of stub_shebang usage that I found linked in the Bazel Slack: SeleniumHQ/selenium@210bcf5#diff-7fc57714ef13c3325ce2a1130202edced92fcccc0c6db34a72f7b57f60d552a3R20

That codebase is on Bazel 5.1.1 and so can take advantage of the new functionality.

@mattem @f0rmiga @hrfuller I've thrown up a draft PR of using stub_shebang and doing version checking: #699. I've added some commentary on it too.

Works with my very basic bazel run testing.

It is a bit of a smell that bazel uses a stub that has a dependency on the host environment. Another workaround / hack to this on macos is to put python on the PATH in the tools/bazel shim.

export PATH="$(brew --prefix)/opt/python/libexec/bin":$PATH

(or whatever is similar for your environment)

Question: does anyone know why the first-stage python_stub_template.txt needs to be written in python? Do other rulesets have this problem where they try to reuse a toolchain as part of the first-stage launcher?

@groodt I don't think placing the interpreter in tools/bazel solves any real problems. It would require extra steps before bazel build. Also, it wouldn't work with RBE.

It is a bit of a smell that bazel uses a stub that has a dependency on the host environment.

There's almost no way to avoid a host environment dependency here, though. e.g /bin/bash is also a problematic host dependency (doesn't exist on Windows, Macs have ancient versions of it, etc). The only real way to avoid it is to build a native executable (which is what Bazel does for Windows, and largely what we do at Google).

Question: does anyone know why the first-stage python_stub_template.txt needs to be written in python?

It doesn't. Jumping in the time machine, it came around circa 2004 using Python 2.2 (and was ~8 lines long). I think it actually predates the template-expansion support! Internally, our stub script is actually a few lines of bash to do some misc setup before invoking Python for the rest of the startup code.

We should at minimal allow setting stub_shebang on py_runtime. It may be tricky to get the path to the toolchain python in there, not sure.

Well, I've been wrestling with a similar problem within Google as we try to get rid of the last usages of relying on a system-installed Python, and it's been a pain, so I'll share a bit of my findings/conclusions. HTH.

IMHO/IME, the stub_shebang attribute is basically useless for achieving hermetic builds.

Half the problem comes from remote execution, like f0rmiga said. Because it's a string attribute, it can't carry any inputs along with it. Absolute paths are (pretty much by definition) machine/platform specific, which prevents using them for remote execution. A relative path has to refer to some build artifact, but then you have the problem of using a relative path in a shebang and knowing the relative path to use (either an execroot relative path or a runfiles relative path). Relative paths work in shebangs, but can be a bit finicky because they rely on the PWD. Relative paths are also finicky because of things like binaries nested in binaries (or other intermediaries that chdir)

I haven't followed these hermetic toolchain PRs too closely, but I'm guessing, fundamentally, they rely on defining an in-build py_runtime? i.e.,py_runtime(files = ..., interpreter = ...)? If so, this leads to the other half of the chicken/egg problem: the py_runtime rule puts an in-build interpreter into the runfiles of the binary, but most of what the stub script does is find the runfiles directory. If the interpreter is in the runfiles, how can you use the interpreter to find the runfiles? ๐Ÿ” ๐Ÿฅš

Unfortunately, I haven't had time to really investigate solutions to this. The two avenues I wanted to investigate are (a) changing the stub script to a two-phase bash script (the first phase uses bash to simply find the runfiles dir and interpreter, then passes off to the interpreter), or (b) generate a native startup executable that does (a) (I'm guessing this is basically what the Windows launcher does?), or (c) maybe more involved changes in the py_binary/py_runtime could help address this, maybe in combination with a/b.

Thanks for the insight. In my opinion, it does seem like solving the "bootstrap problem" would be best solved in a way that doesn't involve the runtime itself due to that chicken-egg.

It would be incredible if it was solved with some sort of native launcher inside bazel that had enough degrees of freedom to support any language rules or runtime.

@rickeylev Is a reasonable sketch of the problem this pseudo-code:
exec(runfiles-interpreter, ["the-thing.py"])

Bazel needs something to create a platform independent "exec" function (portable binary) without a runtime that can find the interpreter and use it to interpret a script. It feels like something like that built into bazel would enable all languages that bundle an interpreter in the runfiles to launch? It's not needed for languages that compile statically without a separate runtime such as rust or golang or CPP etc.

Is a reasonable sketch of the problem this pseudo-code

Yeah, that'd about be the psuedo-code of the generated executable. As a user, it'd have to be spelled more like how a rule impl would interact with it:

def _impl(ctx):
  executable = create_runfiles_based_runtime_starter_thingamig(
    output = ctx.label.name,
    runfiles_path_to_runtime=py_interpreter.path,
    argv=[ctx.file.main.path]
  )
  return [DefaultInfo(executable=executable, ...)]

create a platform independent "exec" function (portable binary)

Well, a portable binary as far as a user (rule impl) is concerned. There's basically no way to have a single chunk of bytes that describe an executable that is cross platform.

It feels like something like that built into bazel would enable all languages that bundle an interpreter in the runfiles to launch?

Yes, I agree. There's a pretty wide variety of interpreted languages out there + things like Java

We ran into this issue and the workaround we use is to embed a little bash script in the shebang line like this:

py_runtime(stub_shebang="#!/usr/bin/env -S /bin/bash -c '$0.runfiles/<path to Python interpreter> $0')

(yes, this works)

py_runtime(stub_shebang="#!/usr/bin/env -S /bin/bash -c '$0.runfiles/<path to Python interpreter> $0 \"$@\"'") worked for us.

can find the interpreter and use it to interpret a script

Based on your description it sounds like getting toolchain binary (same as in python/current_py_toolchain.bzl) and run it with ctx.run(...).

@rickeylev this is going to be even more of a problem with bzlmod. We no longer have the user register a toolchain.

Any ideas on how to fix this?

This issue is because the bootstrap script is implemented in Python, so needs some python interpreter for itself to run.

bzlmod won't affect this, neither will changes to toolchain registration.

This issue is fixed in https://github.com/aspect-build/rules_py because it doesn't have any Python bootstrap script.

The #!/usr/bin/env -S /bin/bash hack is great but doesn't work on POSIX implementations of /usr/bin/env implementations such as toybox. Here is a py_runtime implementation that works on POSIX env implementations:

ATTRS = {
    "interpreter": attr.label(
        doc = "The Python interpreter.",
        allow_single_file = True,
        executable = True,
        cfg = "exec",
    ),
    "python_version": attr.string(
        doc = "Whether this runtime is for Python major version 2 or 3. Valid values are `PY2` and `PY3`.",
        default = "PY3",
        values = ["PY2", "PY3"],
    ),
    "files": attr.label_list(
        doc = "The set of files comprising this runtime. These files will be added to the runfiles of Python binaries that use this runtime.",
    ),
}

def implementation(ctx):
    # `rules_python` needs a Python interpreter to launch Python
    # This hashbang:
    #   - Overrides to launch the POSIX shell
    #   - POSIX shell ignores the first triplet quote
    #   - Uses the script path to find the interpreter in the runfiles
    #   - Launches the same script within the Python interpreter
    #   - Python ignores the shell script because it is in a triplet quote
    #   - The triple quote is needed because the `__future__` declarations must be seen first
    #   - Python runs the script to completion and returns back into the POSIX shell
    #   - The POSIX shell then exits before reading the rest of the Python code
    hashbang = '''#!/usr/bin/env sh
"""set" -eu
"$0.runfiles/{}" "$0" "$@"
exit
"""
'''.format(ctx.file.interpreter.path.removeprefix("external"))

    return PyRuntimeInfo(
        interpreter = ctx.file.interpreter,
        python_version = ctx.attr.python_version,
        stub_shebang = hashbang,
        files = depset(transitive = [t.files for t in ctx.attr.files]),
    )

py_runtime = rule(
    doc = "Creates a hermetic Python runtime.",
    implementation = implementation,
    attrs = ATTRS,
    provides = [
        PyRuntimeInfo,
    ],
)

This can then be used to configure a Python toolchain that uses the interpreter to bootstrap itself:

load("@rules_python//python:defs.bzl", "py_runtime_pair")
load(":py_runtime.bzl", "py_runtime")

py_runtime(
    name = "runtime",
    files = ["@python//:files"],
    interpreter = "@python//:python3",
)

py_runtime_pair(
    name = "info",
    py3_runtime = ":runtime",
)

toolchain(
    name = "toolchain",
    toolchain = ":info",
    toolchain_type = "@rules_python//python:toolchain_type",
)

That assume you have registered a rules_python toolchain in MODULE.bazel:

bazel_dep(name = "rules_python", version = "0.25.0")

python = use_extension("@rules_python//python/extensions:python.bzl", "python")
python.toolchain(
    configure_coverage_tool = True,
    python_version = "3.9",
)
use_repo(python, python = "python_3_9")

If we put this in @rules_python//python:py_runtime.bzl and override the default stub_shebang it'll make rules_python hermetic on POSIX(y) systems. Would the project accept a change like this?

Would the project accept a change like this?

Oh wow that is clever! I especially like that it gives us some way to do a bit more logic to handle any special cases.

I think the short answer is yes, we'd accept patches to help improve the situation using tricks in stub_shebang. The code for rules_python's toolchain generation is mostly in python/repositories.bzl (grep for py_runtime). This would help Bazel 5.4 and Bazel 6.

For Bazel 7, we can more directly solve this because the bootstrap template can be in rules_python -- see python/private/python_bootstrap_template.txt. Note this is currently only used by the not-yet-activated starlark implementation of the rules, but it would be easy to make the bazel-native implementation use it in the interim (just modify the repositories.bzl as above).

Once the Starlark implemenation is activated, then we'll have a lot more options.

Would it be reasonable to ship with a very small cross-platform launcher executable built with https://github.com/jart/cosmopolitan ? (Mostly joking, I suspect the solutions above are far more practical.)

There's been discussions on similar ideas before Cross-platform native launchers for Python
(some may find it surprising that Windows already has a native launcher)

There's also other projects such as https://github.com/a-scie/jump

Ultimately, we need some sort of "bootstrapper" that understands runfiles. There's a couple things to consider more deeply in my opinion:

  • Not all python interpreters are "relocatable". Yes python-build-standalone has been compiled to be standalone, and it is very convenient, but it does have quirks and while rare, if these matter, I suspect they matter A LOT. cpython is not the same as a JVM which truly is relocatable (yet https://peps.python.org/pep-0711/). If we make a decision one way or the other here, it needs to be acknowledged if it's intended to use any cpython (including any in-tree cpython, or only python-build-standalone)
  • Needs to be clear on behaviour for "tools" (executed as part of bazel action or repo rules) vs building an application artifact that is built to be executed outside the bazel build itself
  • It may be better to provide users with additional abstractions to compose desired behaviour, rather than overloading py_binary (spit-balling, but perhaps introduce a py_application or a py_environment or both) which composes a launcher, interpreter and py_binary

The above shebangs were not working for me for py_test targets because py file script path is referenced by path from within .runfiles in test-setup.sh

Longer version was needed:

stub_shebang = r"""#!/usr/bin/env -S /bin/bash -c 'if [[ $(pwd) =~ (.*\\.runfiles[/$]) ]]; then ${BASH_REMATCH[0]}/<path to python interpreter> $0 "$@"; else $0.runfiles/<path to python interpreter> $0 "$@"; fi'""",

I spent the weekend looking into this. I have a prototype that looks promising and was able to get things working. The gist of how it works is a two-stage bootstrap.

Stage one is some shell code to locate the runfiles directory and call the interpreter with the second stage bootstrap. In the case of executable zip files, it extracts the zip first. The overall goal of this stage is to get into the target interpreter as soon as possible.

Stage two is Python code that runs before the application's actual main file. It handles the heavy lifting. It sets up sys.path, handles coverage collection (if applicable), cleaning up of the extracted zip file (if applicable) and then uses runpy to run the application's main file. The runpy module handles invoking another file as if it was the main entry point.

I'm quite liking this two-stage design. It has several benefits:

  • Because sys.path is setup, the issue of the PYTHONPATH environment variable goes away
  • Managing coverage is a lot easier. I changed the code to use the coverage APIs directly
    instead of calling it as a subprocess.
  • Because Python doesn't re-exec() the process, or invoke a subprocess, a couple issues
    relating to that go away (something about windows and spaces, another about
    signal handling, coverage collection is greatly simplified)
  • Because we're running under the target interpreter (instead of some unknown interpreter),
    we have some more freedom in setting up, tearing down, and instrumenting the runtime,
    which feels very freeing.

There's two things I've found annoying about this design

The first is due to our support for regular zip files (python3 myapp.zip) and executable zip files (./myapp.zip). Those cases basically require their own bootstraps.

  • ./myapp.zip requires stage1 to unzip to a temp directory, locate the interpreter, and then run $interpreter __main__.py. In this case, __main__.py can assume extraction has occurred.
  • python3 myapp.zip requires the same __main__.py to be invoked directly, but __main__.py can't assume it has been extracted.

The second is that our current toolchain API has a single bootstrap template (which acts as the template for the (non-zip) foo executable and as the template for the zip __main__.py), but this design requires separate templates (one for stage1, one for stage2). I'd like to avoid this being a breaking change, but that might be rather involved in this case. We'll see how it looks as i clean it up.

Some notes to self about the types of invocations that need to be handled.

  1. Linux/mac, build_zip=false, direct invocation: bazel build :foo; bazel-bin/foo
    • foo needs to be bash script that locates the interpreter and main, then runs <foundpython> stage2.py
  2. Linux/mac, build_zip=true, direct invocation: bazel build --python_build_zip :foo; bazel-bin/foo
    • foo needs to be a zip file that is executable and self-extracting. i.e. it has a bash preamble that extracts, locates the interpreter, and runs <foundpython> stage2.py
  3. Windows, build_zip=false, direct invocation: bazel build :foo; bazel-bin/foo.exe -> <somepython>.exe foo
    • foo needs to be python file, i.e. the stage2.py file
  4. Windows, build_zip=true, direct invocation: bazel build --python_build_zip :foo; bazel-bin/foo.exe -> <somepython>.exe foo.zip
    • This case is a bit weird. The windows launcher can be told the runfiles path to the interpreter, and it'll try to use it. However, I don't see code that extracts the zip, so I don't see how it could be using an in-build interpreter. After it constructs the would-be runfiles path, it checks if the path exists, and if not, uses simply python (i.e. from path) to run the zip.
    • foo.zip needs to have __main__.py. However, it must handle extract itself and can't assume the current interpreter is the correct one.
  5. Manually passing zip to some python: bazel build {maybe build_python zip} :foo; <somepython> foo.zip.
    • It's not clear if doing this with --build_python_zip=true is a sensible use case. If you specified that flag, then the output is an executable zip. So why manually pass it?
    • This case is basically Windows+build_python_zip=true. The foo.zip/__main__.py file is how the binary is entered, and it has to handle extract and can't assume the current runtime is the correct one.

So I think my takeaway here is...

  • The foo.zip/__main__.py file can be akin to the stage1 bash code, just implemented in Python. It needs to locate the runfiles and interpreter, then invoke a stage2.py file.
  • Great, so there's basically three bootstrap templates: stage1 bash, stage1 python, and stage2 python.
  • ๐Ÿ˜’
  • Case 3, 4, and 5 and basically the same: An unknown python is being directly called on us (either foo or foo.zip/__main__.py). So those files have to be stage1 python implementations. On windows, this means foo and foo.zip/__main__.py are the same content. On linux/mac, the two can be distinct (e.g. a stage1 bash instead)

Just out of curiosity: I searched the documentation for "zip" and couldn't find a single mention. Would it be an option drop support for some undocumented edge cases if this makes the solution much easier/cleaner/earlier?

--build_python_zip is in the Bazel docs (not in rules_python docs)