abseil/abseil-cpp

[Bug]: Missing constructor for `absl::synchronization_internal::KernelTimeout` in shared builds

Closed this issue · 5 comments

Describe the issue

While building tensorflow with new abseil 20240116.1, we ran into the following issue:

# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: bazel-out/k8-opt/bin/external/local_xla/xla/service/libslow_operation_alarm.pic.a(slow_operation_alarm.pic.o): in function `xla::SlowOperationAlarm::AlarmLoop()':
slow_operation_alarm.cc:(.text._ZN3xla18SlowOperationAlarm9AlarmLoopEv+0x21c): undefined reference to `absl::lts_20240116::synchronization_internal::KernelTimeout::KernelTimeout(absl::lts_20240116::Time)'

While it's clear where this symbol lives, we cannot link to absl/synchronization:kernel_timeout_internal, because it has private visibility.

However, after realizing tensorflow never explicitly uses KernelTimeout in its codebase, I decided to look back at abseil, and I think this is a similar situation to #1624, because AFAICT it concerns absl::Mutex and inlining (1cf6469).

In particular, in a shared build, some methods have to construct a synchronization_internal::KernelTimeout, e.g.

bool AwaitWithTimeout(const Condition& cond, absl::Duration timeout) {
return AwaitCommon(cond, synchronization_internal::KernelTimeout{timeout});
}
bool AwaitWithDeadline(const Condition& cond, absl::Time deadline) {
return AwaitCommon(cond, synchronization_internal::KernelTimeout{deadline});
}

Since that constructor cannot be found (due to visibility), we fail.

Steps to reproduce the problem

Build tensorflow against abseil 20240116.1

What version of Abseil are you using?

20240116.1

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC 12, nvcc 12.0

What build system are you using?

bazel

Additional context

No response

Does the patch proposed in #1624 (comment) fix the problem? That would be a hint that this is actually the same issue as #1624.

Does the patch proposed in #1624 (comment) fix the problem? That would be a hint that this is actually the same issue as #1624.

We ran into this problem with the patched abseil. There are three possibilities as far as I can see:

  • the patch is independent of the bug
  • the patch uncovered the bug
  • the patch introduced the bug

I still think that it's very closely related, because of the way how constructors/destructors are (apparently) missing in the shared library.

However, it is not the same in the sense that the tensorflow builds do set NDEBUG.

@derekmauro, what do you think about the situation here now? I get that you'd like constructurs/destructors to be inlined as much as possible, but given that synchronization_internal::KernelTimeout is used from another library (absl_mutex), this doesn't work in shared builds.

I haven't had time to look into this, sorry.

Yeah, it's OK. I managed to dig a bit more and found it's an interaction between bazel/tensorflow and their vendored abseil. I'll close this, as there's nothing to do for abseil AFAICT.