bazelbuild/rules_python

Allow accessing requirements directly by labels

person142 opened this issue ยท 10 comments

๐Ÿš€ feature request

Relevant Rules

py_binary, py_library, and py_test

Description

Per the README accessing requirements by labels is not supported:

To avoid breakage, please use requirement() instead of depending directly on wheel repo labels.

The requirements also have somewhat unintuitive labels since they prepend pypi__ onto the (sanitized) package name. Not working with labels directly has some consequences:

  • It doesn't work with buildifier (bazelbuild/buildtools#955)
  • It doesn't work with buildozer (bazelbuild/buildtools#565 (comment))
  • In the next release it will be hard to work with all_whl_requirements without digging into the requirements workspace, so the abstraction is leaky. (Though uncommon, I find I occasionally need to dig through the requirements workspace in some other settings too.)
  • It's fairly obfuscated; i.e. it adds an extra layer on top of the normal mechanisms for referring to targets from external workspaces
  • It doesn't scale well-there used to be requirement, now there's whl_requirement, if you started wrapping console scripts then there would need to be a new function for that, etc. etc.

Describe the solution you'd like

Is it possible to:

  • Shorten the labels so that it's @<workspace>//<sanitized-package-name> (For the short form of the label.)
  • Officially support working with the labels directly

Describe alternatives you've considered

We don't enforce that Python deps be sorted. When I need to buildozer a requirement I do some hacky find-replace to stringify it and then unstringify it.

Ok, so the reason for the status quo is mentioned in the wheel extracting code:

Further, rules-python automatically adds the repository root to the PYTHONPATH, meaning a package that has the same name as a module is picked up. We workaround this by prefixing with pypi__. Alternatively we could require --noexperimental_python_import_all_repositories be set, however this breaks rules_docker. See: bazelbuild/bazel#2636

This happens inside the shim for launching a py_binary:

# Returns repository roots to add to the import path.
def GetRepositoriesImports(module_space, import_all):
  if import_all:
    repo_dirs = [os.path.join(module_space, d) for d in os.listdir(module_space)]
    repo_dirs.sort()
    return [d for d in repo_dirs if os.path.isdir(d)]
  return [os.path.join(module_space, '__main__')]

the thing I'm not sure of is why the external repository root is added to the Python path-that seems undesirable? (And here forces us to add prefixes to things to work around it.)

Thanks for the issue @person142, this is an important misalignment with the rest of the rules ecosystem.

Checking this locally for myself, I can indeed run a py_binary and see an entry in the sys.path like:

'/private/var/tmp/_bazel_jonathon/06afea7962d8d117f8909d0040da389c/execroot/rp_foo/bazel-out/darwin-fastbuild/bin/tools/bar/main.runfiles/pypi',

where @pypi is the external repository created by pip_install.

It does seem undesirable that this ends up on the PYTHONPATH. We need to track down why it ended up like this.

One interesting thing is that pip_parse sidesteps the issue since everything is unpacked into the top level of the generated repo. Currently if you do something like

pip_parse(
   name = <name>,
   requirements_lock = "//:requirements.txt",
)

then the generated repos will have names like <name>_pypi__<package>, but that could be shortened to <name>_<package> or even just <package> (so long as the user takes care to avoid name collisions).

I'm working on a PR to make the prefix configurable so that the above shortened forms could be tried out.

Aforementioned PR: #459.

And working at it from the other end, bazelbuild/bazel#2636 (comment) on the Bazel side.

So if we can get the --incompatible_repo_roots_are_not_on_pythonpath flag added and move ahead with #459, then users can flip the flag and adjust the prefixes to be nicer, which would get us started on the process of improving the situation.

We probably want something pretty close to what rules_jvm_external provides which is an external repo per external dep and then each external dep is exposed to users via a single main external repository. So theres:

  • "@maven//:org_mockito_mockito_core" (user facing)
  • @org_mockito_mockito_core_3_10_0//file:file (hidden from user)

If an alias rule is used, does this ensure pip_parse still sidesteps the the issue? I haven't checked.

The requirements.bzl helper produced by pip_install has another critical flaw not mentioned earlier: it eager-fetches all the wheels from pypi when the load() statement is evaluated. So if the user has a BUILD file with a py_library and a go_library, and only asks bazel to compile the Go code, they'll still have to wait for pip to install the complete set of requirements.

At a minimum I think we should stop saying "always use the requirement() helper" and instead say "you can use the __pypi names but note that a future release could change these, in that case we'll note it as a breaking change and give you a buildozer command to easily update your code"

The requirements.bzl helper produced by pip_parse on the other hand is fine from this eager-fetching perspective.

@person142 should this be closed now that we have landed the change to pick your prefix for pip_parse?

Yes, the situation for pip_parse is quite nice now; I've switched to using labels directly.

Update: I'm guessing this is what incompatible_generate_aliases = True is for, though not sure what it's incompatible with. Unfortunately it still isn't working for me but I think I must be doing something wrong.

Update 2: the new label format under incompatible_generate_aliases is @pip//pytest/@pip//pytest:pkg, etc

Is just writing the label still supposed to be supported under bzlmod? If not, should the "Using third_party packages as dependencies" section of the README be updated to recommend against doing this?

I see requirement() under bzlmod is now generating labels like @@rules_python~0.20.0~pip~pip//:pytest_pkg, and all of my rules depending on things like @pip_pytest//:pkg are totally broken. Hope I am just doing something wrong...

With

pip = use_extension("@rules_python//python:extensions.bzl", "pip")

pip.parse(
    name = "pip",
    requirements_lock = "//:requirements_lock.txt",
)

use_repo(pip, "pip")