bazelbuild/rules_python

Packages imported with pip/requirements infrastructure cause import issues

cpatrick opened this issue · 29 comments

For example, import google.auth blocks google.storage blocks google.bigquery, etc.

Here's a minimal reproduction: https://github.com/cpatrick/minimal_bazel_python_bug_repro.

Maybe I'm misunderstanding a feature or option I could be using?

With zero context, I suspect this is caused by this issue, but can TAL to confirm after lunch.

Thank you for the repro!

Interestingly that does not seem like the same issue.

@duggelz the repro prints out sys.path, which seems to include the root of the bigquery whl's repository, so I can't understand why the binary isn't able to load it? Do you have any great insight here?

So, my initial guess is that since multiple packages contain a google root package and the approach taken is one directory per package added to sys.path, the import system finds the first one and doesn't search through the rest of sys.path.

A solution could be to combine all of the requirements-installed python packages into a single dist-packages-style folder, but that's a little bit scary...

I'm hoping y'all have better ideas. 😸

@cpatrick Yeah, that would be trouble if that's what's going on.

If that's truly what's going on, yes we might be able to work around it in the pip rules, but it still represents a somewhat fundamental problem with how the imports=[] kwarg manifests (I assume it is the usage here that creates these sys.path entries)

Perhaps we need a better solution for declaring a library to be a part of the top-level namespace than imports=["."], and py_binary should handle it better than just stacking sys.path (perhaps it should blend these in the runfiles or something?).

Thinking out loud...

Ah, thanks for the pointer and explanation. Yeah, I think that's what happens. the imports=["."] is a bit too much of a hammer.

Blending in runfiles as pip/setuptools does in site-packages feels like the option most consistent with the standard way of doing this. Looking at my virtualenv for the project that I'm bazel-ifying (or from the requirements in my repro-case) that's what you get--a single tree with google/auth, google/cloud/storage, google/cloud/bigquery, etc. all from different packages.

The google package is complicated. Basically every package like google.cloud.foo, or google.protobuf, or google.tonsofotherstuff, defines its own "google" package, and the first one wins. So every package has to carefully construct its "google" package which does the right thing if it happens to be the "winner", which can easily change based on almost anything. This doesn't always happen. We don't have a central "owns the google package" authority unfortunately.

So, either 1) subpar doesn't support the right pkg_resources and pkgutil calls, which is completely possible, or 2) the "winning" google package happens to be one of the ones that doesn't correctly initialize itself in the maximally friendly way.

I'll look into it further.

Here it is just py_binary, not even PAR :-/

Hi folks. Any update on this? Let me know if there's anything I can do to help reproduce or even contribute to fixing the issue.

Any workarounds for this?

@damienmg Anyone from the Bazel/Python side we can include here?

drigz commented

The issue appears to be linked to the presence of empty __init__.py files inside the various google/ directories. These don't do the "right thing" described by @duggelz, which involves modifying __path__ so that the other directories are searched for subpackages.

By modifying the runfiles after the build (specifically, replacing those files with this one) I could get the google modules to import (although it still fails with No module named concurrent.futures, which I haven't looked into yet).

The empty __init__.py files are created by Bazel if your package doesn't have __init__.pys already. Maybe we need an optional flag for py_library that marks the library as a "namespace library" and uses puts the right thing into __init__.py? Or pip_import could create these files?

drigz commented

I was able to work around the issue in 5f78b4a, which generates __init__.py files with namespace support. However, I'm not sure this is the right path, and I haven't tested extensively yet.

The bit I like least is hardcoding that googleapis-common-protos' empty google/api/__init__.py should be overwritten.

drigz/minimal_bazel_python_bug_repro@9644328 shows the changes necessary to use this, if anyone else is interested in trying it out.

My description above was actually somewhat incorrect, since the namespace packages are handled in a different way than I thought. I'm experimenting with a solution.

I haven't forgotten about this, but it's a little tricky.

drigz commented

No problem, I've been getting on OK with the workaround.

I'm curious; what approach are you trying? Will it link/copy the files into a single site-packages style tree?

drigz commented

@duggelz Would you like me to create a pull request with my workaround? Looks like a couple of users have cherry-picked it.

I found this issue in the google group thread here https://groups.google.com/forum/#!topic/bazel-discuss/NMO6KPyPKh4 but isn't this related to this issue #55? (sorry didn't see this issue before I created mine) but there I describe what's the root cause and what we can do about it, I also have a workaround there too.

This is indeed a dup of #55. However, your writeup in #55 is excellent.

What's the current status on this? Between all the activity here and on #55, I'm kind of lost. However, it does appear to still be a problem for some situations.

I managed to work around it with a combination of legacy_create_init = False and creating a separate __init__.py for the namespace package which manually assigns __path__ with hard-coded absolute paths. That's really ugly though.

My specific problem is with zope-interface, which doesn't have a zope/__init__.py file at all, which means pkgutil.extend_path doesn't even pick it up. I'm not quite sure how it's supposed to work with python2 actually...

drigz commented

@brian-peloton My (incomplete) understanding is that this will be resolved as part of Operation Purple Boa but a near-term solution is not planned. https://github.com/TriggerMail/rules_pyz offers an alternative set of rules that may work better for you than the current rules_python.

#55 has more information (especially on the importance of .pth files for zope) but was closed after the introduction of legacy_create_init.

Adding @brandjon, the new proud owner of our Python support.

ali5h commented

using backports.functools_lru_cache seems to cause the same issue. it uses pkgutil.extend_path

So, just to be clear, there's no current support for using protobuf and google-cloud packages with these rules?

rules_python 0.1.0 has been released which upstreams the rules_python_external repo. Please switch from pip_import to pip_install which doesn't have this issue.

@alexeagle chances there is an example of that switch? I am new to bazel and am having issue solving this.

Yeah look at examples in this repo. Sorry no one has had time to write a migration guide. Maybe @AutomatedTester can help

I can maybe have a look at docs next week but @MushuEE but SeleniumHQ/selenium@40689b3 is my commit that moved us to 0.1.0 rules to get you moving sooner