Change of --output_base causes subsequent builds to fail
konste opened this issue · 18 comments
Description of the problem / feature request:
Change in --output_base parameter causes the build to break with very obscure error messages.
Feature requests: what underlying problem are you trying to solve with this feature?
When build is done using some IDEs (VSCode or IntelliJ) they tend to set their own --output_base, different from the one configured by the user for the command line builds. This should not cause any problems, but unfortunately it appears that after --output_base changes the build is broken.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Select any simplest Bazel project. Build it with bazel --output_base=C:/blah1 build ...
so it builds successfully. Then try to build it with bazel --output_base=C:/blah2 build ...
this time it breaks with the error messages which don't make any sense.
What operating system are you running Bazel on?
The problem does not seem to be OS dependent.
What's the output of bazel info release
?
2.0.0
Any other information, logs, or outputs that you want to share?
I figured that the problem is must probably caused by the stale "courtesy" symlink in WORKSPACE folder. After the first build Bazel creates "courtesy" symlink such as bazel-<workspace_name>
in the workspace folder and it points inside output_base. When we issue second build command with the different output_base Bazel is smart enough to realize that and spawn second build server process, but unfortunately that stale bazel-<workspace_name>
symlink still stays and points inside the old output_base which seems to confuse Bazel. Running bazel clean
between builds or simply deleting of that symlink fixes the problem. It looks like when Bazel discovers the change in startup parameters which warrants spawning new build server it should at the same time remove existing courtesy symlinks as they are not valid anymore and cause the build to fail.
Reproduced with Bazel 2.0:
03:30:25 /tmp/ws
$ cat BUILD
genrule(
name = "g",
outs = ["g.txt"],
cmd = "touch $@",
)
03:30:28 /tmp/ws
$ cat WORKSPACE
03:30:31 /tmp/ws
$ bazel --output_base=/tmp/one build //... && bazel --output_base=/tmp/two build //...
INFO: Analyzed target //:g (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:g up-to-date:
bazel-bin/g.txt
INFO: Elapsed time: 0.119s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
ERROR: error loading package 'bazel-ws/external/bazel_tools/tools/build_defs/pkg': Label '//tools/python:private/defs.bzl' is invalid because 'tools/python' is not a package; perhaps you meant to put the colon here: '//:tools/python/private/defs.bzl'?
INFO: Elapsed time: 0.219s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (12 packages loaded)
currently loading: bazel-ws/external/bazel_tools/tools/test/CoverageOutp\
utGenerator/java/com/google/devtools/coverageoutputgenerator ... (2 packages\
)
Fetching @rules_java; fetching
I'm seeing something very similar, but even more basic when using the latest pre-release vscode-bazel (which sets --output_base by default). vscode-bazel runs this command:
bazel --output_base=/tmp/ee79067f914abe58284ab7a8abdc7f7d query ...:* --output=package
It fails with the following output:
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Call stack for the definition of repository 'rules_cc' which is a http_archive (rule definition at /tmp/ee79067f914abe58284ab7a8abdc7f7d/external/bazel_tools/tools/build_defs/repo/http.bzl:292:16):
- /tmp/ee79067f914abe58284ab7a8abdc7f7d/external/bazel_tools/tools/build_defs/repo/utils.bzl:205:9
- /DEFAULT.WORKSPACE.SUFFIX:302:1
ERROR: error loading package 'bazel-sdk/external/bazel_tools/third_party/jarjar': Label '//tools/jdk:remote_java_tools_aliases.bzl' is invalid because 'tools/jdk' is not a package; perhaps you meant to put the colon here: '//:tools/jdk/remote_java_tools_aliases.bzl'?
If I remove the --output_base, the query succeeds just fine. If I change the --output_base to --output_user_root, the query succeeds just fine. It's almost like bazel cannot modify things in install_base when outputBase has been specified like that. Perhaps the query is sandboxed and overriding output_base is causing install_base to not be in the sandbox? Just wild-guessing.
For reference, I've seen a number of different but very similar errors and they all involve downloading http_archive rules (or similar download rules) for setting up a workspace prior to running a query. I think the intent of output_base was to not disturb the main workspace environment (and thus be able to run concurrently), but maybe the install_base is write-only in that case? For what it's worth the install_base HAD been populated with the downloads prior to running the query with --output_base, so I'm guessing the need to re-download is because changing output_base invalidated something. Maybe that's a clue?
Is anyone actively working on this? This is a pretty annoying bug that affects how e.g. Jenkins build nodes have to be spawned because the cannot handle multiple executors being isolated with output_base
. I'd be willing to investigate a fix but am not familiar enough with bazel's codebase to even know where to start looking. Any pointers would be appreciated :)
I found a workaround, but I don't know why it works:
- Let $WORKSPACE=app
- Remove the existing symlink
bazel-app
- Replace with a dummy file
echo "dummy" >| bazel-app
- The builds no longer fail, but you get a warning:
failed to create one or more convenience symlinks for prefix 'bazel-'
I tried setting --symlink_prefix
, --experimental_use_sandboxfs
, but it did not work. It's as if 'bazel-' is hardcoded somewhere.
I hope that helps.
The basic problem here is that the bazel-$WORKSPACE convenience symlink is blindly traversed by Bazel: you can even do bazel build //bazel-$WORKSPACE/...
and it will load the labels without any issues, and even build the targets there if you're lucky. When you switch output bases, the symlinks are broken, but the fundamental issue is visiting that convenience symlink in the first place.
My suggestion for a workaround for now is to do echo "bazel-$WORKSPACE" >> .bazelignore
. The .bazelignore file in the root of the workspace tells Bazel not to consider those directories. Of course, Bazel should be smart enough to not consider them on its own.
cc @mhy1992
I found that janakdr's solution works on a simple Bazel project which have very little external dependency. But BAZEL BUILD TARGET
still fails on complex projects like serving
Basic question for people experiencing this issue: can the "alternate" output_root invocation just pass --symlink_prefix=/
? If the invocation is being done by an automated process, it shouldn't need those convenience symlinks at all, and it means that your build outputs would remain easily accessible even without doing another build.
At least for me, neither --symlink_prefix=/
nor --experimental_convenience_symlinks=ignore
improve anything.
Does passing --experimental_no_product_name_out_symlink
help?
Not for me. In fact, with that option, I get even more error messages.
The workarounds don't work for me either. Deleting the symlinks did not help, nor did passing in the flags mentioned. I get error messages of the form:
ERROR: error loading package 'my_bazel_output/external/bazel_skylib': cannot load '//:bzl_library.bzl': no such file
I am wondering what state is causing Bazel to get confused even when the symlinks have all been deleted.
I also see errors of the form:
ERROR: error loading package 'my_old_bazel_output/external/rules_python/python/runfiles': Label '//python:defs.bzl' is invalid because 'python' is not a package; perhaps you meant to put the colon here: '//:python/defs.bzl'?
This suggests that somehow the new output folder is looking for files in the old one through the symlinks. Perhaps I missed deleting some symlinks. Deleting my_old_bazel_output and restarting the bazel server seems to fix the problem but that prevents the ability to have two or more concurrent build folders that the user can switch between.
This issue was surprising to me, given the Bazel output directory layout page explicitly mentions:
The symlinks for “bazel-”, “bazel-out”, “bazel-testlogs”, and “bazel-bin” are put in the workspace directory; these symlinks point to some directories inside a target-specific directory inside the output directory. These symlinks are only for the user’s convenience, as Bazel itself does not use them. Also, this is done only if the workspace directory is writable.
EDIT: Ignore my comment. I triaged my particular issue to #13601.
For those who are having this failure with the VSCode Bazel Extension with an error along the lines of:
Command failed: bazel --output_base=/var/folders/z4/tjbsqpbj5pz3_mh9jzb4vhxw0000gn/T/5dc9f0e710b2578b352c533d7f70060e query ...:* --output=package
Loading: 0 packages loaded
ERROR: error loading package '': at /Users/--------/go/src/gitlab.com/-----/-----/build/go/-----.bzl:3:6: Every .bzl file must have a corresponding package, but '@bazel_gazelle//:deps.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.
Loading: 0 packages loaded
Loading: 0 packages loaded
For me, this problem was rectified by opening the VSCode Bazel extension settings, finding the "Queries Share Server" checkbox, and ensuring it is ticked (unticked by default).
The box has the description: "Use the same Bazel server for queries and builds. By default, vscode-bazel uses a separate server for queries so that they can be executed in parallel with builds. You can enable this setting if running multiple Bazel servers has a negative performance impact on your system, but you may experience degraded performance in Visual Studio Code for operations that require queries."
I think the root cause is probably related to this issue, because (I think) to get the separate server the --output_base
argument is provided, which then runs into the issue you describe here.
For those coming from the future: You'll also get this sort of error if the workspace name in your WORKSPACE file differs from the enclosing directory name.
e.g. if you have foo/WORKSPACE and that file contains 'workspace(name="bar")' you will get weird and confusing errors.
FYI this is the most commonly reported issue on the vscode bazel extension, and there is no clear workaround for it while using a distinct output base.
I initially thought we could pass --deleted_packages
, but this doesn't take package prefixes, so the only way of making this work would be to enumerate all packages in the bazel-*
directories.
Possibly the issue lies here:
A fix for this issue has been included in Bazel 7.1.0 RC2. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.1.0rc2. Thanks!
Hi, I found this issue is still not properly handled, it still misses a case when the output_base is inside the workspace
To reproduce, just config output_base as bazel-cache, and build the //... target
Please help to also support this case if you don't might
Hi, I found this issue is still not properly handled, it still misses a case when the output_base is inside the workspace To reproduce, just config output_base as bazel-cache, and build the //... target Please help to also support this case if you don't might
@LittleCuteBug do you have a reproducible code or repo?