[Bug]: Missing constructor for `absl::synchronization_internal::KernelTimeout` in shared builds
Closed this issue · 5 comments
Describe the issue
While building tensorflow with new abseil 20240116.1, we ran into the following issue:
# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: bazel-out/k8-opt/bin/external/local_xla/xla/service/libslow_operation_alarm.pic.a(slow_operation_alarm.pic.o): in function `xla::SlowOperationAlarm::AlarmLoop()':
slow_operation_alarm.cc:(.text._ZN3xla18SlowOperationAlarm9AlarmLoopEv+0x21c): undefined reference to `absl::lts_20240116::synchronization_internal::KernelTimeout::KernelTimeout(absl::lts_20240116::Time)'
While it's clear where this symbol lives, we cannot link to absl/synchronization:kernel_timeout_internal
, because it has private visibility.
However, after realizing tensorflow never explicitly uses KernelTimeout
in its codebase, I decided to look back at abseil, and I think this is a similar situation to #1624, because AFAICT it concerns absl::Mutex
and inlining (1cf6469).
In particular, in a shared build, some methods have to construct a synchronization_internal::KernelTimeout
, e.g.
abseil-cpp/absl/synchronization/mutex.h
Lines 375 to 381 in 2f9e432
Since that constructor cannot be found (due to visibility), we fail.
Steps to reproduce the problem
Build tensorflow against abseil 20240116.1
What version of Abseil are you using?
20240116.1
What operating system and version are you using?
Linux
What compiler and version are you using?
GCC 12, nvcc 12.0
What build system are you using?
bazel
Additional context
No response
Does the patch proposed in #1624 (comment) fix the problem? That would be a hint that this is actually the same issue as #1624.
Does the patch proposed in #1624 (comment) fix the problem? That would be a hint that this is actually the same issue as #1624.
We ran into this problem with the patched abseil. There are three possibilities as far as I can see:
- the patch is independent of the bug
- the patch uncovered the bug
- the patch introduced the bug
I still think that it's very closely related, because of the way how constructors/destructors are (apparently) missing in the shared library.
However, it is not the same in the sense that the tensorflow builds do set NDEBUG
.
@derekmauro, what do you think about the situation here now? I get that you'd like constructurs/destructors to be inlined as much as possible, but given that synchronization_internal::KernelTimeout
is used from another library (absl_mutex), this doesn't work in shared builds.
I haven't had time to look into this, sorry.
Yeah, it's OK. I managed to dig a bit more and found it's an interaction between bazel/tensorflow and their vendored abseil. I'll close this, as there's nothing to do for abseil AFAICT.