Windows feature generate_pdb_file should be applicable to opt builds
sunjayBhatia opened this issue · 23 comments
Description of the problem / feature request:
We would like to generate .pdb
files for release builds of Envoy for Windows for use debugging potential production issues. This works well with Bazel building locally as all intermediate/compiler/linker generated files live on the host. We use opt
mode, the MSVC cl /Z7
option, and linker options /DEBUG:FULL
. However, building Envoy with RBE in CI, the expected .pdb
file output is not accessible. It is not an explicit output of the cc_binary
rule on Windows and is not downloaded to the host running the build (or presumably any rules that were to depend on the binary target, not sure if a genrule
workaround could work to grab the relevant file). This is in contrast to Linux where cc_binary
rules create an implicit <target>.dwp
target that can be used to access debug info. Using --remote_download_outputs=all
(the default) or --features=generate_pdb_file
does not work as the feature is only enabled for dbg
and fastbuild
compilation modes.
It would be great if the restriction of the feature to dbg
and fastbuild
modes was relaxed
Feature requests: what underlying problem are you trying to solve with this feature?
See above, generate .pdb
files as when building Envoy with RBE in opt
compilation mode.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Build a cc_binary
rule in opt
compilation mode and link with MSVC link /DEBUG
options to generate a .pdb
file in RBE on Windows. See that no .pdb
file is downloaded even when using --features=generate_pdb_file
or --remote_download_outputs=all
.
What operating system are you running Bazel on?
Windows (2019)
What's the output of bazel info release
?
release 3.6.0
What version of rules_cc do you use? Can you paste the workspace rule used to fetch rules_cc? What other relevant dependencies does your project have?
What Bazel options do you use to trigger the issue? What C++ toolchain do you use?
C++ toolchain: https://github.com/envoyproxy/envoy-build-tools/tree/master/toolchains/configs/windows/msvc-cl/bazel_3.6.0 (generated via bazel-toolchains
)
Have you found anything relevant by searching the web?
Not as of yet
Any other information, logs, or outputs that you want to share?
Not as of yet
While this is describing what should be emitted as a result of linking, there is a parallel PR concerned with emitting the .pdb results corresponding to each .obj file, and then compressed into a library-scope .pdb using fission at lib construction time.
It would be nice if the underlying bazel rules_cc mechanics follow similar toggles. #78
Update: the following about pdb file and linking output is not entirely correct, please see #94 (comment)
The pdb file is generated by the link action but not in the default output group of cc_binary (therefore not retrieved in RBE build). You can access it by specifying the output_group in the following ways:
- In command line
bazel build //:hello-world --output_groups=pdb_file
(--output_groups=pdf_file,default
if you want both the binary and the pdb file) - In BUILD file
filegroup(
name = "my_pdb_file",
srcs = ["//:hello-world"],
output_group = "pdf_file",
)
Can you try if this fixes the issue with RBE?
Using the option 2. above, we were successful at envoy in adding an envoy-static-exe target that captures both the default and the pdf_file group. [Option 2 was only an illusion, I'd presumed the absence of an .exe indicated that the .pdb was emitted, which you note later it was not.] Thank you for the guidance! Documenting this for future reference could be helpful?
Documenting this for future reference could be helpful?
Yes, we should definitely improving the documentation on this!
@meteorcloudy I feel like I'm missing something obvious. In this workaround...
envoy_cc_binary(
name = "envoy-static",
stamped = True,
deps = [":envoy_main_entry_lib"],
)
filegroup(
name = "envoy-static-pdb",
srcs = [":envoy-static"],
output_group = "pdb_file",
)
filegroup(
name = "envoy-static-exe",
srcs = [
":envoy-static",
":envoy-static-pdb",
],
)
The envoy-static-pdb does not capture the .exe... but grouping them together in envoy-static-exe does not capture the .pdb.
What I think I'm reaching for is
output_group = ["default", "pdb_file",],
but there is obviously no such construct.
@meteorcloudy although we wanted this structed at the BUILD file level, I have experimented with your top-line solution.
Alternative 1. listing both file groups for the target produced no .pdb result from our invocation of
bazel [...] build [...] //bazel/... //source/exe:envoy-static --output_groups=pdb_file,default --build_tag_filters=-skip_on_windows
where envoy-static is the cc_binary mentioned above, and which does not generate a .pdb file (I've tried both pdf_file and what i think you meant based on the tooling sources, pdb_file.)
https://github.com/envoyproxy/envoy/pull/16043/files
So the first solution may not work under distributed RBE either?
Hmmm, I tested locally with a hello-world cc_binary, it should be working. Can you also confirm if it works for you in a local build?
pcloudy@PCLOUDY2-W MSYS ~/workspace/my_tests/bazel
$ git diff
diff --git a/examples/cpp/BUILD b/examples/cpp/BUILD
index f450f67242..ad7e7cf70c 100644
--- a/examples/cpp/BUILD
+++ b/examples/cpp/BUILD
@@ -14,6 +14,20 @@ cc_binary(
deps = [":hello-lib"],
)
+filegroup(
+ name = "hello-world-pdb",
+ srcs = [":hello-world"],
+ output_group = "pdb_file",
+)
+
+filegroup(
+ name = "hello-world-all",
+ srcs = [
+ ":hello-world",
+ ":hello-world-pdb",
+ ],
+)
+
cc_test(
name = "hello-success_test",
srcs = ["hello-world.cc"],
pcloudy@PCLOUDY2-W MSYS ~/workspace/my_tests/bazel
$ bazel build examples/cpp:hello-world-all
...
INFO: Analyzed target //examples/cpp:hello-world-all (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/cpp:hello-world-all up-to-date:
bazel-bin/examples/cpp/hello-world.exe
bazel-bin/examples/cpp/hello-world.pdb
INFO: Elapsed time: 0.273s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
So the first solution may not work under distributed RBE either?
It's possible the RBE code doesn't respect the output groups, I'm not sure if this is a bug.
/cc @coeuvre Do you happen to know what files Bazel choose to retrieve from RBE?
Confirming, local build of all varieties we've discussed, as well as our existing logic, all emit the pdb file to bazel-out//x64_windows-opt/bin/source/exe/envoy-static.pdb
- but that isn't helping in our RBE CI pipeline where unrecognized nuggets are discarded. What we must avoid is any situation where the .pdb is triggered and invokes the linker, and the .exe is triggered and invokes the linker a second time. MSVC is basically incapable of performing a deterministic build (clang-cl is a little closer to that goal.)
This commit sort of makes me want to cry, @meteorcloudy - I don't understand the delta, why the previous case fails in a GCP RBE environment (including the correct spelling of the file group tag), but the second wins. So we have a workaround, not quite one I expected;
Any thoughts?
@wrowe I set up a remote build and did some debugging on this.
It turned out what I said before was not entirely true. The pdb file is always an output of the linking action in dbg
and fastbuild
modes. No matter what output groups you specific, as long as it triggers the linking action it should always be retrieved from remote as part of the linking action outputs.
The reason it didn't work for you is because you have optimized mode (-c opt) enabled for your remote build, in that case, the pdb file is not generated at all and is of course not an output of the linking action.
I believe if you simply remove the -c opt
option the pdb file will be retrieved.
See generated_pdb_file in windows cc toolchain.
The pdb file is always an output of the linking action in
dbg
andfastbuild
modes. No matter what output groups you specific, as long as it triggers the linking action it should always be retrieved from remote as part of the linking action outputs.
This has long been our observation, agreed.
The reason it didn't work for you is because you have optimized mode (-c opt) enabled for your remote build, in that case, the pdb file is not generated at all
This is not correct at all. As a default behavior no pdb is generated, but in a release binary, all vendors capture .pdb for windows just as they build linux/bsd release binaries with -g, and then either strip and retain the un-stripped binary, or use objcopy to split symbolic information into a distinct .debug file, pointing the release executable at that parallel .debug symbol file (which is exactly what a .pdb file on windows is).
In this case envoy and most projects add /Z7 symbols to each .obj file although these are still compiled optimized, and we compile release as "/debug /opt:ref /opt:icf" for envoy this logic lives here and here for compilation and linking;
https://github.com/envoyproxy/envoy/blob/ca4ca983195f337a840618b45b9b33cda82c1224/bazel/envoy_internal.bzl#L50
https://github.com/envoyproxy/envoy/blob/ca4ca983195f337a840618b45b9b33cda82c1224/bazel/envoy_binary.bzl#L63
[There is a related ask, which would also need to support .pdb output from /Zi rather than /Z7 for each .obj compilation even in -c opt mode, see https://github.com//pull/78]
and is of course not an output of the linking action.
Yes, this issue describes the fact that a pdb file itself needs to be captured in the linking action even in -c opt build.
I believe if you simply remove the
-c opt
option the pdb file will be retrieved.
Yes, that might be the case, but that is not our bazel use case. Our use case is for bazel to generate release binaries optimized for the end user, capturing the un-stripped symbolic information for the released binary for forensic crash dump analysis. No responsible vendor ships an executable without this; in proprietary source this is either withheld for internal use only, perhaps further stripped for public consumption with only public symbols, while in open source there is no reason to withhold or conceal this data. It's not part of the "binary distribution" but typically a distinct "-dbg", "-debug", "-debuginfo" corresponding package.
Back to our ask, we are capturing the un-stripped linux executable we distribute from envoy opt builds. The same functionality is required on windows. Any help you can offer would be greatly appreciated.
There is the totally distinct question of whether this should be the default for all -c opt output. A stripped linux binary is equivalent to "just the .exe" on Windows. And we've elected not to emit symbolic info for the hundreds and hundreds of test .exe files, since they are not production code shipping to the public, their inputs are predictable, and they can be rebuilt with symbolic info on an individual basis for deeper troubleshooting of a test case failure.
@wrowe Thanks for the explanation! I think now I understand your use case.
IIUC, you want to build with opt mode but still generate the pdb file by passing flags like /DEBUG:FULL
by yourself. But the problem is that Bazel only recognizes the pdb file as an output of the linking action when the feature generate_pdb_file
is enabled, see here.
However, currently generate_pdb_file
is only triggered in debug and fastbuild modes, and it is also restricted to be used in those two modes, check the feature definition:
generate_pdb_file_feature = feature(
name = "generate_pdb_file",
requires = [
feature_set(features = ["dbg"]),
feature_set(features = ["fastbuild"]),
],
)
Fortunately, I see you use your own generated msvc toolchain in the remote build:
https://github.com/envoyproxy/envoy-build-tools/blob/main/toolchains/configs/windows/msvc-cl
Then how about this solution:
- Remove those lines that restricts the
generated_pdb_file
feature todbg
orfastbuild
- Enable the
generated_pdb_file
feature. You can do it by:- Either in command line with
--features=generated_pdb_file
, this will enable the feature for all cc_binaries. - Or add
features = ["generate_pdb_file"]
to the cc_binary you want the pdb file generated, maybe this fits your use case better?
- Either in command line with
I manually tested this solution should work as long as you ensure /DEBUG
flag is passed, otherwise Bazel will complain the pdb file is not generated.
Let's make it so, drop the restrictions, as quickly as 4.1.0-rc5 if at all possible, and leave all defaults. Note I find the same logic to be corrected in the rules_cc repository, cc/private/toolchain/windows_cc_toolchain_config.bzl and in the bazel repository itself in tools/cpp/windows_cc_toolchain_config.bzl
This makes it possible for us to avoid deploying a forked bazel build to our GCP RBE environment, and proceed with testing various combinations of opt with pdb and fastbuild omitting pdb.
Thanks for your continued research @meteorcloudy
I have limited flexibility to test these changes in our GCP RBE environment, it would be helpful if you could do so. Once in bazel, testing the various combinations in our toolchain should be straightforward here.
I have verified the solution in a GCP RBE environment. I'm sending a change to both Bazel and rules_cc.
I don't think the fix will get into a Bazel release very soon since it's likely we won't have a rc5 for 4.1.0.
Shouldn't you have full control over https://github.com/envoyproxy/envoy-build-tools? You'll need to update this even when you have the fix in Bazel, right?
@meteorcloudy are you suggesting we put in custom bazel build into Envoy's build docker image? This will complicate things quite a bit.
When is the next bazel release?
Correct, we regenerate envoy-build-tools on every bazel or dependency bump that we want to capture.
However, we drive bazel from bazelisk, so it has to be something, even by-tag, I can have bazelisk grab. I didn't see any interesting flags to have bazelisk point to a a flavor of bazel from a non-bazelbuild repo.
@meteorcloudy are you suggesting we put in custom bazel build into Envoy's build docker image? This will complicate things quite a bit.
Nope, I thought you are using the toolchains checked in under https://github.com/envoyproxy/envoy-build-tools/tree/main/toolchains/configs/windows/msvc-cl/bazel_4.0.0/cc, so you can just fix the toolchain here. But looks like you're actually using the generated toolchain directly, then my suggestion probably won't work.
When is the next bazel release?
4.1.0 will be released soon, after that we'll start preparing for 4.2.0, we don't have a fixed timeline yet.
See bazelbuild/bazel#13099 (comment)
So, the converse of this problem, removing pdb objects from fastbuild, appears to be equally problematic...
The rules_cc msvc/clang-cl options are added after all linkflags, an antipattern to the linux implementation. This means that
/MACHINE:X64
/DEFAULTLIB:libcmt.lib
/DEBUG:FASTLINK
/INCREMENTAL:NO
will always override any linkopts specified in the cc_binary (or cc_library) and cannot be overridden, for example, with linkopts = ["/DEBUG:NONE"]
Since nocopts has been removed from bazel, it is also impossible for fastbuild to remove the msvc/clang-cl hack for the /Z7
flag. Unlike linux -g0
, there is no corresponding /Z0
override. Perhaps a custom feature recognizing an imaginary /Z0
(-Z0
) flag that would cut all previously given /Z7
and /Zi
flags might be a solution?
We should be able to toggle the feature off to avoid capturing .pdb output of the fastbuild binary link step, but this isn't sufficient to avoid the wasted build disk and cpu utilization when compiling fastbuild.
Thoughts?
Hmm,, ideally users should be able to override certain features with their own desired flags, but this isn't currently possible in our auto detected cc toolchain setup. /cc @oquenchil This is another case that our cc toolchain isn't still flexible enough.
The only solution I can think of is to use your own customized cc toolchain, which can be derived from the auto detected one. An example is here: https://github.com/meteorcloudy/my_tests/tree/master/preconfigure_msvc_toolchain
The downside is that the compiler path will be fixed absolute path in this preconfigure toolchain, so it only works for environment where compiler path is determined and matches, like RBE.
So I can report that bumping only bazelbuild/rules_cc in envoyproxy/envoy#16043 - including the capture of the bazel --output_groups=pdb_file,default - emitted no .pdb file to be copied. I'm going to presume the other half of this solution is to fix bazel itself, which I can't practically test until later in the week.
As observed at envoyproxy/envoy#16043 this may be largely resolved although the solution could be simpler.
Thanks for the notice, seems this issues can be closed.