bazel-contrib/rules_python

Implement venv/site-packages based binaries

Opened this issue ยท 22 comments

groodt commented

Context

This is a tracking issue to recognise that the lack of a site-packages layout causes friction when making use of third-party distribution packages (wheels and sdists) from indexes such as PyPI.

Outside bazel and rules_python, it is common for distribution packages to assume that they will be installed into a single site-packages folder, either in a "virtual environment" or directly into a python user or global site installation.

Notable examples are the libraries in the AI / ML ecosystem that make use of the nvidia CUDA shared libraries. These shared libraries contain relative rpath in the ELF/Mach-O/DLL which fail when not installed as siblings in a site-packages layout.

There is also a complication introduced into the rules due to lack of the single site-packages folder. Namespace packages in rules_python are all processed into pkg-util style namespace packages. This seems to work, but wouldn't be necessary if site-packages was used.

Another rare issue is failure to load *.pth files. Python provides Site-specific configuration hooks that can customize the sys.path at startup. rules_python could workaround this issue perhaps, but if a site-packages layout was used and discovered by the interpreter at startup, no workarounds would be necessary.

Distribution packages on PyPI known to have issues:

  • torch
  • onnxruntime-gpu
  • rerun-sdk
  • tensorflow-probability

Known workarounds

  1. Patch the third-party dependencies using rules_python patching support
  2. Use an alternative set of rules such as rules_py
  3. Patch the third-party dependencies outside rules_python and push the patched dependencies to a private index

Related

Proposed design to solve

The basic proposed solution is to create a per-binary virtual env whose site-packages contains symlinks to other locations in runfiles. e.g. $runfiles/mybin.venv/site-packages/foo would be a symlink to $runfiles/_pypi_foo/site-packages/foo

TODO list

  • Add PyInfo.site_packages_symlinks. A depset of site-packages relative paths and runfiles paths to symlink to.
  • Make pypi-generated targets use this site-packages solution by default
    • Disable pkgutil-style __init__.py generation in pypi repo phase
    • Maybe refactor the pypi generation to use a custom rule instead of plain py_library.
  • Add a flag to allow experimentation and testing
  • Edge cases
    • if two distributions install into the same directory and/or have overlapping files
    • Handling pkgutil-style packages
    • Interaction of bootstrap=script vs bootstrap=system with this new layout
    • Handle platforms/cases where symlinks can't be created at build time (windows, using rules_pkg)
    • Handling if multiple versions of a distribution are in the deps and ensuring only one is used, while still respecting merge/conflict logic.

Hi experts, I'm new to Bazel and having an issue with nvimgcodec. I tried adding it in the preloading function, but preloading doesn't seem to work for me.

def _preload_cuda_deps(lib_folder: str, lib_name: str) -> None:
    """Preloads cuda deps if they could not be found otherwise."""
    # Should only be called on Linux if default path resolution have failed
    assert platform.system() == 'Linux', 'Should only be called on Linux'
    import glob
    lib_path = None
    for path in sys.path:
        nvidia_path = os.path.join(path, 'nvidia')
        if not os.path.exists(nvidia_path):
            continue
        print(f"Checking nvidia_path {nvidia_path}")
        if "nvimgcodec" == lib_folder:
            candidate_lib_paths = glob.glob(os.path.join(nvidia_path, lib_folder, 'libnvimgcodec.so.*[0-9]'))
        else:
            candidate_lib_paths = glob.glob(os.path.join(nvidia_path, lib_folder, 'lib', lib_name))
        print(f"Found candidate_lib_paths {candidate_lib_paths}")
        if candidate_lib_paths and not lib_path:
            lib_path = candidate_lib_paths[0]
        print(f"Found lib_path {lib_path}")
        if lib_path:
            break
    print(f"Preloading {lib_name} from {lib_path}")
    if not lib_path:
        raise ValueError(f"{lib_name} not found in the system path {sys.path}")
    ctypes.CDLL(lib_path)
def preload_cuda_deps() -> None:
    cuda_libs: Dict[str, str] = {
        'cublas': 'libcublas.so.*[0-9]',
        'cudnn': 'libcudnn.so.*[0-9]',
        'cuda_nvrtc': 'libnvrtc.so.*[0-9].*[0-9]',
        'cuda_runtime': 'libcudart.so.*[0-9].*[0-9]',
        'cuda_cupti': 'libcupti.so.*[0-9].*[0-9]',
        'cufft': 'libcufft.so.*[0-9]',
        'curand': 'libcurand.so.*[0-9]',
        'cusolver': 'libcusolver.so.*[0-9]',
        'cusparse': 'libcusparse.so.*[0-9]',
        'nccl': 'libnccl.so.*[0-9]',
        'nvtx': 'libnvToolsExt.so.*[0-9]',
        'nvimgcodec': 'libnvimgcodec.so.*[0-9]',
    }
    for lib_folder, lib_name in cuda_libs.items():
        _preload_cuda_deps(lib_folder, lib_name)

I have several Nvidia libraries and I wanted to use 'from nvidia import nvimgcodec', but multiple libraries have their own 'nvidia' directory under site-packages (e.g., pip_deps_cublas_cu11/site-packages/nvidia/ and pip_deps_nvimagecodec_cu11/site-packages/nvidia/), and 'from nvidia' always directs me to the cublas library.

My workaround is to copy the nvimgcodec library from my local Python environment to the Bazel directory, place it under pip_deps_nvimagecodec_cu11/site-packages/nvidia_img/, and then use 'from nvidia_img import nvimgcodec'.

I also tried just copying the nvimgcodec library from pip_deps_nvimagecodec_cu11/site-packages/nvidia and modifying the linking, but that didn't work, so I copied it from my local environment instead.

I'm not sure if I can add this as a patch because it doesn't really make sense. Do you know if there's a better solution for this? Thanks so much for your help!

I also tried just copying the nvimgcodec library from pip_deps_nvimagecodec_cu11/site-packages/nvidia and modifying the linking, but that didn't work, so I copied it from my local environment instead.

By the way, for this, I mean I could use 'from nvidia_img import nvimgcodec,' but seems the library is not initialized correctly. When I try to run the sample code to get a decoder, it seems that I just get None. I'm not sure if it's related to the copying and re-linking.

from nvidia import nvimgcodec
decoder = nvimgcodec.Decoder()
1e100 commented

Could someone comment in which version of rules_python this is not broken? PyTorch did work before, at the very minimum. It'd be great to know if there's a rollback path.

1e100 commented

rules_py does seem to fix it, with some workarounds.

keith commented

Rules python has always worked this way. So yea it's not a regression.

I've renamed this to better reflect what is desired: to use a virtual environment with site-packages.

I've made a bit of progress on that front. --bootstrap_impl=script creates and uses a venv for the interpreter. So half the problem (use a venv) is solved.

The other half is populating the site-packages directory.

My current thinking is to have PyInfo carry info so that py_binary can then create symlinks in the site-packages directory. Probably a depset[tuple[str site_packages_path, str runfiles_actual_path]]. Or maybe depset[File marker]? It might be possible to combine the import strings (from attr.imports) and the marker file to infer the site-packages directories. For .pth files, I think we'd just glob() any (distro root level) pth files and stick them in site-packages? Which would go into another PyInfo field.

I haven't thought it through extensively or tried anything, yet. I mean, it sounds good at a high level. My optimism is tempered, though, because bootstrap=script has brought up more edge cases than I was hoping. It definitely would have gone better with some more eyes for design, exploration, and verification.

I was quite careful to avoid the mention of a "venv" in the original issue to be honest. ๐Ÿ˜‚ The new title is probably more appropriate for a PR that addresses some/all of the combined friction described in this issue, but I don't think it matters too much as long as this issue is easily discoverable.

While a venv might be some path towards a solution to the friction I describe, it's likely just an implementation detail. For example, when packaging python applications into docker containers or distroless (without bazel), you don't need to use a venv and you typically just install into the site-packages folder of the interpreter directly. Using a venv in the bootstrap doesn't directly solve any of the friction outlined in the original issue.

I'm not sure on the commentary about .pth files. When using a traditional site-packages or any folder marked as a "site dir" (site.addsitedir(dir)), .pth files are handled automatically by the interpreter and standard lib.

The same is true for Namespace packages. In a normal "site dir", no special handling is required.

re: venv not relevant to solving the site-packages issue: ok, good points, fair enough. A key part of the issue is that, right now, the various libraries aren't collected into a single location ("site-packages"). Collecting them like that is problematic because each binary can have a different set of dependencies. Hence why I've closely associated venvs with the issue: the binary-specific venv, which has its own site-packages directory, is a natural place to collect them.

In any case, I have a PR almost ready for review. It has 3 key parts:

  1. PyInfo.site_packages_symlinks. This allows propagating up information about what should be put into site-packages.
  2. py_library.site_packages_root. This allows interpreter the files in srcs as following a site-packages layout; it feeds into the PyInfo field above
  3. In py_binary, it uses the PyInfo field to populate the site-packages directory in its venv.

Note that the above only applies to bootstrap=script. One of the neat things about this is it's pretty cheap and supports multiple dependency closures in a single build.

Something I ran into as I implemented the above is how the pip repo rules handle namespace packages. Right now, they create a pkgutil style shim -- this confuses the build-phase logic into thinking they aren't namespace packages and causes conflicts trying to figure out what files to symlink. This should be easy to address, though.

I have several Nvidia libraries and I wanted to use 'from nvidia import nvimgcodec'
but multiple libraries have their own 'nvidia' directory under site-packages
pip_deps_cublas_cu11/site-packages/nvidia/ and pip_deps_nvimagecodec_cu11/site-packages/nvidia/

This seems problematic no matter what? Unless nvidia is a namespace package?

With #2617 merged, there's some basic/experimental support for populating a binary's venv's site-packages with stuff. It assumes using --bootstrap_impl=script. There's a bunch of edge and special cases to deal with still (see end of the issue description, where I've listed some), but please report bugs if you try it.

Here's quick rundown of things.

Setting --@rules_python//python/config_settings:venvs_site_packages=yes will make pypi dependencies populate into the site-packages directory. docs.

PyInfo.site_packages_symlinks docs is what powers things under the hood. This provider field allows creating symlinks in a binary's venv that point back to a runfiles path. Custom rules can provide their own PyInfo object with the value set. Note, however, that the field is still in flux a bit (right now its a 2-tuple, but I'm pretty sure it'll turn into a three-tuple or struct). If people want to use now, then we can change it to a struct sooner rather than later.

features.bzl%py_info_site_packages_symlinks docs can be used to detect the presence of this new API.

Astute readers may find py_library.experimental_venvs_site_package docs -- this is a special field set by the pypi-generation code to trigger the venvs-site-packages behavior. I don't think this attribute will stick around (unless people find it useful, i guess?). I just mention it because it's an easy way to experiment with this new behavior without having to go through pypi-generated code.

@rickeylev fwiw, I don't think it's possible for most real world python users to rely on/globally enable --bootstrap_impl=script due to key roadblocks in that implementation. is depending on that the long-term plan or just temporary?

The plan is to enable bootstrap=script by default. I've just been waiting for a release that didn't have any bug reports for it, so I can start that process. Last release (or the one before) met that criteria, so I've started to do that (see #2760 -- it'll be default for linux, initially).

If you know of bugs, please report them and tag me! AFAIK (well, remember ๐Ÿ˜…), there aren't any outstanding issues with it.

I see you edited out some of your comment :) ? PRs to system_python are welcome in the meantime -- how bootstrap=script works could mostly also be applied to bootstrap=system_python, but since it's slated for removal, I haven't put effort into that (for further discussion, lets start a separate thread, though).

The PR making bootstrap=script the default for non-windows is queued for merge now.

I started working on a script-based bootstrap using powershell: https://github.com/rickeylev/rules_python/tree/feat.bootstrap.script.windows

It's slow going because:

  • Bazel doesn't allow creating relative symlinks on Windows. Even if a custom action to create them is used, Bazel rejects them. Thus, the venv must be created at runtime or the venv must contain a full copy of everything.
  • venvs look a bit different on windows (bin->Scripts, lib/python{version}/site-packages -> Lib/site-packages
  • The interpreter logic differs slightly (python.exe looks in Scripts/ for its supporting DLL files)
  • Powershell being a new language to me doesn't help
  • Probably other stuff

In any case, windows expertise welcome. The branch above has some WIP/repro notes.

re: populating venv and site packages:

Some of us were talking, and think we'll replace the PyInfo.venv_site_packages_symlinks with something more generic: PyInfo.venv_symlinks. It'll be similar to before (depset of tuples/structs), except the elements it contains will have an additional field, "type", which tells which part of the venv something is supposed to go within. This is to allow populating scripts/entry points, includes, etc parts of a venv (more than just site-packages), while abstracting the platform and version-specific details of the venv construction.

# tuples of type, venv_path, runfiles_path
("bin", "cowsay", "pypi_cowsay/scripts/cowsay")
("site-packages", "cowsay", "pypi_cowsay/site-packages")
("includes", "", "pypi_cowsay/includes/cow.h")
# ->
$runfiles/
  pypi_cowsay/
    scripts/cowsay
    site-packages/...
    includes/...

  foo.venv/
  $venv/
    bin/cowsay -> ../../pypi_cowsay/scripts/cowsay
    lib/pythonX.Y/site-packages/cowsay -> ../../pypi_cowsay/site-packages
    includes/cow.h -> ../../pypy_cowsay/includes/cow.h

(And on windows, the "bin" type would go into the Scripts dir, etc)

Something I ran into as I implemented the above is how the pip repo rules handle namespace packages. Right now, they create a pkgutil style shim -- this confuses the build-phase logic into thinking they aren't namespace packages and causes conflicts trying to figure out what files to symlink. This should be easy to address, though.

OK, for namespace packages, I started a #2882, where I think we could have the following architecture:

  1. Add an attribute to py_library to create namespace pkg shims.
  2. Based on srcs and data files, pass it to the newly implemented function.
  3. Get back the list of dirs where we need to create shims.
  4. Use ctx.actions.template to template the shims in those dirs
  5. Add the newly templated files to outputs.
  6. If the venv layout is requested, do none of the above.

So for the next steps (after 2908 gets merged) here my ideas are:

  • we somehow need to add dist info and data to the final venv and it would be great to have some way to know where we should symlink things from data.
  • Then generating console scripts in the bin and add other stuff there.

EDIT: discussed this during the maintainers chat a little more.

So the main idea is that we will need to have some sort of path mapper from the extracted whl_library layout to the venv. This means that we will also need to have the py_library be aware of all of the things that could possible go into the venv.

Initial ideas here are to use ideas discussed in #1730 to make the py_library return a provider or something similar so that not all paths are equal.

Did you consider creating a statically linked binary instead of a script (#2500)? This would allow packaging a py_binary into an oci_image with an empty/distroless base.

I think the general consensus is that we would eventually like to have a statically linked binary loader and we are using a script for easier prototyping for now.

I think the general consensus is that we would eventually like to have a statically linked binary loader and we are using a script for easier prototyping for now.

All the statically linked binary needs to do is to know the path to the Python interpreter and the Python bootstrap script relative to itself / using the runfiles library. Then all of the remaining logic could be implemented in Python (which seems a more natural fit for rules_python compared to shell or C++). So now migrating to a shell script and then translating that back to Python later seems like a lot of additional work.

@mering, #2500 is the issue that is for the statically linked loader, let's discuss how to do it there.

Fyi @arrdem -- PyInfo.venv_symlinks is implemented. It allows populating the bin/ directory. I didn't have time to add support for include/, but that'd be simple to add in.

Do we need support for include? I think that we probably just want to have a filegroup with includes in it. That said, I see that numpy under rules_python does not install the headers outside the site-packages folder. This suggests that this may be package dependent where the headers end up.

Not proud of it, but for now I went with something like this

  • given #3228 (comment) still is in progress
  • Another issue I need to deal with before switching to venv layout is we have a launcher for running the targets in vscode debugger (which just queries the entry point and runs it). With venv layout since laucher is now bash, that workflow breaks.
# py_binary.bzl

load("@rules_python//python:defs.bzl", _py_binary = "py_binary")

def py_binary(name, srcs, **kwargs):
    """
    A custom py_binary rule that replicates the behavior of the standard py_binary rule
    with additional functionality to modify the LD_LIBRARY_PATH for specific dependencies.

    Args:
        name: Name of the binary rule.
        srcs: List of source files for the binary.
        **kwargs: Additional keyword arguments passed to the underlying py_binary rule.

    Issue we are working around: https://github.com/bazel-contrib/rules_python/issues/2156
    """

    # Replicate py_binary behavior to find main
    main = kwargs.pop("main", None)
    if len(srcs) == 1 and not main:
        main = srcs[0]
    if not main:
        fail("main must be provided, or length of srcs should be one.")

    # Create the shim starting point that will modify LD_LIBRARY_PATH for torch
    # and friends.
    native.genrule(
        name = name + ".shim",
        srcs = [Label(":pybinary_main.py")],
        outs = [name + ".shim.py"],
        cmd = """
        sed "s/__ACTUAL_MAIN__/{}/g" $(SRCS) > $(OUTS)
        """.format(main),
    )

    # Now make a py_binary with the new actual main.
    new_main = ":{}.shim".format(name)
    _py_binary(
        name = name,
        srcs = srcs + [new_main],
        main = new_main + ".py",
        deps = kwargs.pop("deps", []) + [Label(":ld_library_path")],
        **kwargs
    )
# pybinary_main.py

import os
import sys
from pathlib import Path

from bazel_infra.py_rule_shims.ld_library_path import patch_ld_library_path

ACTUAL_MAIN = "__ACTUAL_MAIN__"

if __name__ == "__main__":
    patch_ld_library_path()
    entrypoint = str(Path(__file__).parent / ACTUAL_MAIN)
    # Sadly this does not work, and we need to start new process :(
    # runpy.run_path(entrypoint, run_name="__main__")
    os.execv(sys.executable, [sys.executable, entrypoint, *sys.argv[1:]])
# ld_library_path.py

"""
A script that on linux, will fix the LD_LIBRARY_PATH to workaround torch import issue
"""

import itertools
import os
import sys
from pathlib import Path

# These are NVIDIA packages that ship .so s.
# Somehow for the last one they did not even put in the nvidia space.
_TARGET_PREFIXES = ["nvidia", "cusparselt"]


def patch_ld_library_path():
    macos = "macos"  # so mypy does not trip
    if sys.platform == macos:
        # This is due to NVIDIA libraries and only happens on linux
        return

    # Iterate over each .so and find .so files with desired prefix
    ld_paths = os.getenv("LD_LIBRARY_PATH", "").split(":")
    seen = set(ld_paths)

    for path_str in sys.path:
        path = Path(path_str)
        so_files = itertools.chain(*[path.glob(f"{prefix}/**/*.so.*") for prefix in _TARGET_PREFIXES])
        for file in so_files:
            parent = str(file.resolve().parent)
            if parent not in seen:
                seen.add(parent)
                ld_paths.append(parent)

    # Update the LD_LIBRARY_PATH environment variable
    os.environ["LD_LIBRARY_PATH"] = ":".join(ld_paths)