abseil/abseil-cpp

[Bug]: missing Mutex::Dtor on linux?

Opened this issue · 17 comments

Describe the issue

While conda-forge/grpc-cpp-feedstock#348 grpc with -DgRPC_ABSL_PROVIDER="package" in conda-forge against abseil 20240116.0, the sanity check of compiling an example against the library fails:

[7/18] Linking CXX executable greeter_client
FAILED: greeter_client 
: && $PREFIX/bin/x86_64-conda-linux-gnu-c++ -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/libgrpc-1.61.0 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib CMakeFiles/greeter_client.dir/greeter_client.cc.o -o greeter_client  libhw_grpc_proto.a  $PREFIX/lib/libabsl_flags_parse.so.2401.0.0  $PREFIX/lib/libgrpc++_reflection.so.1.61.0  $PREFIX/lib/libgrpc++.so.1.61.0  $PREFIX/lib/libprotobuf.so.25.2.0  $PREFIX/lib/libgrpc.so.38.0.0  $PREFIX/lib/libaddress_sorting.so.38.0.0  $PREFIX/lib/libupb_json_lib.so.38.0.0  $PREFIX/lib/libupb_textformat_lib.so.38.0.0  $PREFIX/lib/libupb_message_lib.so.38.0.0  $PREFIX/lib/libupb_base_lib.so.38.0.0  $PREFIX/lib/libupb_mem_lib.so.38.0.0  $PREFIX/lib/libutf8_range_lib.so.38.0.0  $PREFIX/lib/libssl.so  $PREFIX/lib/libcrypto.so  $PREFIX/lib/libgpr.so.38.0.0  -ldl  -lm  -lrt  $PREFIX/lib/libabsl_random_distributions.so.2401.0.0  $PREFIX/lib/libabsl_random_seed_sequences.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_pool_urbg.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen_hwaes.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen_hwaes_impl.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen_slow.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_platform.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_seed_material.so.2401.0.0  $PREFIX/lib/libabsl_random_seed_gen_exception.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_check_op.so.2401.0.0  $PREFIX/lib/libabsl_leak_check.so.2401.0.0  $PREFIX/lib/libabsl_die_if_null.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_conditions.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_message.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_nullguard.so.2401.0.0  $PREFIX/lib/libabsl_examine_stack.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_format.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_proto.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_log_sink_set.so.2401.0.0  $PREFIX/lib/libabsl_log_sink.so.2401.0.0  $PREFIX/lib/libabsl_log_entry.so.2401.0.0  $PREFIX/lib/libabsl_log_initialize.so.2401.0.0  $PREFIX/lib/libabsl_log_globals.so.2401.0.0  $PREFIX/lib/libabsl_vlog_config_internal.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_fnmatch.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_globals.so.2401.0.0  $PREFIX/lib/libabsl_statusor.so.2401.0.0  $PREFIX/lib/libabsl_status.so.2401.0.0  $PREFIX/lib/libabsl_strerror.so.2401.0.0  $PREFIX/lib/libabsl_flags_usage.so.2401.0.0  $PREFIX/lib/libabsl_flags_usage_internal.so.2401.0.0  $PREFIX/lib/libabsl_flags_internal.so.2401.0.0  $PREFIX/lib/libabsl_flags_marshalling.so.2401.0.0  $PREFIX/lib/libabsl_flags_reflection.so.2401.0.0  $PREFIX/lib/libabsl_flags_config.so.2401.0.0  $PREFIX/lib/libabsl_cord.so.2401.0.0  $PREFIX/lib/libabsl_cordz_info.so.2401.0.0  $PREFIX/lib/libabsl_cord_internal.so.2401.0.0  $PREFIX/lib/libabsl_cordz_functions.so.2401.0.0  $PREFIX/lib/libabsl_cordz_handle.so.2401.0.0  $PREFIX/lib/libabsl_crc_cord_state.so.2401.0.0  $PREFIX/lib/libabsl_crc32c.so.2401.0.0  $PREFIX/lib/libabsl_str_format_internal.so.2401.0.0  $PREFIX/lib/libabsl_crc_internal.so.2401.0.0  $PREFIX/lib/libabsl_crc_cpu_detect.so.2401.0.0  $PREFIX/lib/libabsl_raw_hash_set.so.2401.0.0  $PREFIX/lib/libabsl_hash.so.2401.0.0  $PREFIX/lib/libabsl_bad_variant_access.so.2401.0.0  $PREFIX/lib/libabsl_city.so.2401.0.0  $PREFIX/lib/libabsl_low_level_hash.so.2401.0.0  $PREFIX/lib/libabsl_hashtablez_sampler.so.2401.0.0  $PREFIX/lib/libabsl_exponential_biased.so.2401.0.0  $PREFIX/lib/libabsl_flags_private_handle_accessor.so.2401.0.0  $PREFIX/lib/libabsl_flags_commandlineflag.so.2401.0.0  $PREFIX/lib/libabsl_bad_optional_access.so.2401.0.0  $PREFIX/lib/libabsl_flags_commandlineflag_internal.so.2401.0.0  $PREFIX/lib/libabsl_flags_program_name.so.2401.0.0  $PREFIX/lib/libabsl_synchronization.so.2401.0.0  $PREFIX/lib/libabsl_graphcycles_internal.so.2401.0.0  $PREFIX/lib/libabsl_kernel_timeout_internal.so.2401.0.0  $PREFIX/lib/libabsl_time.so.2401.0.0  $PREFIX/lib/libabsl_civil_time.so.2401.0.0  $PREFIX/lib/libabsl_time_zone.so.2401.0.0  $PREFIX/lib/libabsl_stacktrace.so.2401.0.0  $PREFIX/lib/libabsl_symbolize.so.2401.0.0  $PREFIX/lib/libabsl_strings.so.2401.0.0  $PREFIX/lib/libabsl_strings_internal.so.2401.0.0  $PREFIX/lib/libabsl_string_view.so.2401.0.0  $PREFIX/lib/libabsl_int128.so.2401.0.0  $PREFIX/lib/libabsl_throw_delegate.so.2401.0.0  $PREFIX/lib/libabsl_malloc_internal.so.2401.0.0  $PREFIX/lib/libabsl_debugging_internal.so.2401.0.0  $PREFIX/lib/libabsl_demangle_internal.so.2401.0.0  $PREFIX/lib/libabsl_base.so.2401.0.0  -lpthread  $PREFIX/lib/libabsl_raw_logging_internal.so.2401.0.0  $PREFIX/lib/libabsl_log_severity.so.2401.0.0  $PREFIX/lib/$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `grpc::CompletionQueue::~CompletionQueue()':
helloworld.grpc.pb.cc:(.text._ZN4grpc15CompletionQueueD2Ev[_ZN4grpc15CompletionQueueD5Ev]+0x4d): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `grpc::ClientReader<helloworld::HelloReply>::~ClientReader()':
helloworld.grpc.pb.cc:(.text._ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED2Ev[_ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED5Ev]+0x6c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `non-virtual thunk to grpc::ClientReader<helloworld::HelloReply>::~ClientReader()':
helloworld.grpc.pb.cc:(.text._ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED2Ev[_ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED5Ev]+0x10c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `non-virtual thunk to grpc::ClientReaderWriter<helloworld::HelloRequest, helloworld::HelloReply>::~ClientReaderWriter()':
helloworld.grpc.pb.cc:(.text._ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED2Ev[_ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED5Ev]+0x7c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `grpc::ClientReaderWriter<helloworld::HelloRequest, helloworld::HelloReply>::~ClientReaderWriter()':
helloworld.grpc.pb.cc:(.text._ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED2Ev[_ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED5Ev]+0x12c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o):helloworld.grpc.pb.cc:(.text._ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED2Ev[_ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED5Ev]+0x1dc): more undefined references to `absl::lts_20240116::Mutex::Dtor()' follow
collect2: error: ld returned 1 exit status

Previously I had opened #1614 and grpc/grpc#35854, but it now looks like this might be related to f3760b4. In particular, the following looks like a likely culprit

#if defined(__APPLE__) || defined(ABSL_BUILD_DLL)
// When building a dll symbol export lists may reference the destructor
// and want it to be an exported symbol rather than an inline function.
// Some apple builds also do dynamic library build but don't say it explicitly.
Mutex::~Mutex() { Dtor(); }
#endif

I'm not sure what grpc does that (apparently) invalidates the assumptions explained in the commit message of that change.

Steps to reproduce the problem

Build https://github.com/grpc/grpc with -DgRPC_ABSL_PROVIDER="package" against abseil 20240116.0 and then compile the grpc helloworld example. It would also be possible to replay the recipe we have in conda-forge.

What version of Abseil are you using?

20240116.0

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC 12

What build system are you using?

CMake

Additional context

No response

CC @dvyukov as the author of f3760b4.

Relevant context is also that we're using shared builds of abseil.

I ended up building a version of abseil (published in a separate channel) with the following patch:

diff --git a/absl/synchronization/mutex.cc b/absl/synchronization/mutex.cc
index cb3c7e74..76d561df 100644
--- a/absl/synchronization/mutex.cc
+++ b/absl/synchronization/mutex.cc
@@ -731,12 +731,10 @@ static unsigned TsanFlags(Mutex::MuHow how) {
 }
 #endif

-#if defined(__APPLE__) || defined(ABSL_BUILD_DLL)
 // When building a dll symbol export lists may reference the destructor
 // and want it to be an exported symbol rather than an inline function.
 // Some apple builds also do dynamic library build but don't say it explicitly.
 Mutex::~Mutex() { Dtor(); }
-#endif

 #if !defined(NDEBUG) || defined(ABSL_HAVE_THREAD_SANITIZER)
 void Mutex::Dtor() {
diff --git a/absl/synchronization/mutex.h b/absl/synchronization/mutex.h
index d53a22bb..2cd5a1e4 100644
--- a/absl/synchronization/mutex.h
+++ b/absl/synchronization/mutex.h
@@ -1064,11 +1064,6 @@ inline Mutex::Mutex() : mu_(0) {

 inline constexpr Mutex::Mutex(absl::ConstInitType) : mu_(0) {}

-#if !defined(__APPLE__) && !defined(ABSL_BUILD_DLL)
-ABSL_ATTRIBUTE_ALWAYS_INLINE
-inline Mutex::~Mutex() { Dtor(); }
-#endif
-
 #if defined(NDEBUG) && !defined(ABSL_HAVE_THREAD_SANITIZER)
 // Use default (empty) destructor in release build for performance reasons.
 // We need to mark both Dtor and ~Mutex as always inline for inconsistent

With this, the grpc example compiles again.

Could this simply be a case of the shared library case on linux not being covered correctly?

@derekmauro @dvyukov, could you comment about the (un-)inlining of the Mutex destructor, or alternatively, chime in on the linked grpc issue if their usage of the Mutex is somehow not supported?

Do you use bazel or cmake build?

I am not an expert on all open-source absl build modes.
Is it possible to add a test for this?

I see this code does something similar:

#if !defined(__APPLE__) && !defined(ABSL_BUILD_DLL) && \
!defined(ABSL_CONSUME_DLL)
#define ABSL_INTERNAL_INLINE_CURRENT_THREAD_IDENTITY_IF_PRESENT 1
#endif

and it also checks ABSL_CONSUME_DLL. Should we check ABSL_CONSUME_DLL here as well? Though, not sure how this is related to consuming dll's.

I see that this:

if (BUILD_SHARED_LIBS AND MSVC)
set(ABSL_BUILD_DLL TRUE)
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
else()

exports ABSL_BUILD_DLL only for Windows (MSVC).
Is there a macro for BUILD_SHARED_LIBS mode? Perhaps we should check that macro?

Hey @dvyukov, thanks for the response. We're using CMake to build, which correctly sets ABSL_BUILD_DLL on windows (and ABSL_CONSUME_DLL gets correspondingly set when building against shared abseil).

My understanding is that this mechanism originally was intended (only?) for __declspec(dllexport) (resp. dllimport), which is necessary to make shared builds work on windows at all.

Now the problem only appears on Linux, where usually, ABSL_BUILD_DLL should not be a thing, according to my understanding. But if that's the preferred solution, I'm happy to set that symbol also on Linux while building.

It would be functionally equivalent to the patch I posted above, which I already tested successfully.

It would be functionally equivalent to the patch I posted above, which I already tested successfully.

If I read your first patch correctly, it effectively undoes the optimization, so it's not good.

I am not expect on absl build modes and macros, especially used in open-source. So I will defer this to absl maintainers. @derekmauro do you know who can say what's the right fix for this?

I'll take a look when I can. Please be patient.

It would be functionally equivalent to the patch I posted above, which I already tested successfully.

If I read your first patch correctly, it effectively undoes the optimization, so it's not good.

My intention was not to suggest that the patch should be merged, what I meant was that for our specific scenario (builds against a shared abseil) the removal of the #if had the same effect as defining ABSL_BUILD_DLL.

I'll take a look when I can. Please be patient.

I hope that a first ping after a week is not considered excessive. 😅

It is blocking a lot of work in our distribution though, so I was mainly looking for overall direction, rather than a fix. For now, I don't even know if this is an issue in abseil or in grpc, but I'm assuming that lots of places implicitly use the Mutex, and since all our builds are shared, I expect this to be too much to fix everywhere on the side of abseil-consumers (though perhaps I'm wrong though and it's grpc-specific after all!).

The upshot is that I'm looking towards just uninlining the Dtor (in our distribution) for 20240116.x, which should at least unblock us for now, and we can then reconsider for the summer release. If you think I should hold off on that for ${reason}, please let me know.

I haven't tried this, but I ran the build "in my head" (which tends to be unreliable). I think it is likely caused by mixing an installed absl library built with -DNDEBUG with a gRPC library that includes absl headers that is not being built with -DNDEBUG. If this is the case, it is something I've been begging people not to do.

This is one of those build modes I would like to say is not supported, as I regularly have to tell people when they only instrument half of their builds with sanitizers, but the fact of the matter is that with -DNDEBUG, people get away with doing this 99.9% of the time.

One option which I really don't like is to patch this for open-source users, but keep @dvyukov's optimization for internal Google users only. This would be an annoying maintainability problem. I'll have to test it to see how bad it actually looks.

Thank you for the response!

This is an interesting case, because we do set -DNDEBUG for our release builds, but the failure occurred while naïvely building examples/cpp/helloworld from grpc, which doesn't set that symbol.

It's surprising to me that the logic here is

// This macro is inspired by
// https://akrzemi1.wordpress.com/2017/05/18/asserts-in-constexpr-functions/
#if defined(NDEBUG)
#define ABSL_ASSERT(expr) \
(false ? static_cast<void>(expr) : static_cast<void>(0))
#else
#define ABSL_ASSERT(expr) \
(ABSL_PREDICT_TRUE((expr)) ? static_cast<void>(0) \
: [] { assert(false && #expr); }()) // NOLINT
#endif

rather than, say

#ifdef(DEBUG)
  <debug-case>
#else
  <nodebug-case>
#fi

The latter would not open up that footgun of a vanilla build (no options) having a different ABI than one with -NDEBUG.

If this is the case, it is something I've been begging people not to do.

I understand, though the realities of a distribution are that we have to use pre-built artefacts. If that means that we have to put NDEBUG in the library interface, we can do that

This is an interesting case, because we do set -DNDEBUG for our release builds, but the failure occurred while naïvely building examples/cpp/helloworld from grpc, which doesn't set that symbol.

OK, I misdiagnosed that (probably trying to fit your analysis subconsciously) -- the compiler setup is the same in all cases, so we do set -DNDEBUG even for that grpc example. I've double-checked this in conda-forge/grpc-cpp-feedstock#357.

OK, I misdiagnosed that [...]

Yet another update: while our compiler setup is the same (CPPFLAGS contains -DNDEBUG, CXXFLAGS does not), it seems that abseil and grpc differ in how they pick this up - abseil consequently does build with -DNDEBUG, while grpc (or at least the example) doesn't. Adding -DNDEBUG to the CXXFLAGS for the grpc example does make it work again, validating your analysis.

I have also an issue with the inlined desctructor, but on windows when building static libs of abseil.
When I build ortools v9.9 which uses abseil 202401 version of abseil, I get multiply defined symbols for Mutex::Dtor().
Ortools is quite big and it uses abseil, but also Protobuf as dependencies. Up to now I did not test it with the (rollback-)patch, but this seems likely a fix for the issue.

Does this build include inconsistent NDEBUG defines (as @derekmauro mentioned here #1624 (comment))?

Not that I am aware of. The ortools cmake machinery is quite complex so I cannot easily verify that, maybe I find the time to dig into that more. Another possibility could be a wrongly set or logically incorrect usage of the flag BUILD_ABSL_DLL.

I'd like to add, for anyone coming across similar problems, that not defining the NDEBUG macro in your Release build will cause very cryptic errors at the linker stage; in my case, a dependency on grpc (and therefore protobuf and abseil) managed via vcpkg in an MSbuild project on Windows produced ODR Violation linker errors exactly in Mutex::Dtor. It wasn't until I randomly stumbled across this exact thread after hours of increasingly desperate internet searches that I noticed the missing NDEBUG definition on my command line. Obvious error in retrospect, but tracing that down backwards from the link stage to the preprocessor was pretty nightmarish. It feels like this should be mentioned in documentation somewhere, so other developers don't have to go through that ordeal. Maybe this comment will help someone directly or serve as a starting point for this information to be more visible to library users.