iains/gcc-12-branch

GCC 12.2.0 breaks TLS (thread_local) on Darwin at least on x86_64

boris-kolpackov opened this issue · 6 comments

Reproducer:

// file: tl.hxx
#pragma once

extern thread_local void* tl;

inline void* gt_tl () {return tl;}
// file: tl.cxx
#include "tl.hxx"

thread_local void* tl = nullptr;
// file: main.cxx
#include "tl.hxx"

int main ()
{
  return gt_tl () == nullptr ? 0 : 1;
}
$ g++-12 tl.cxx main.cxx
Undefined symbols for architecture x86_64:
  "__ZTH2tl", referenced from:
      __ZTW2tl in cc4nuRRN.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

Interestingly, if we combine the two files into a single translation unit, then the error goes away (which perhaps suggests it's a symbol visibility issue):

$ cat tl.cxx main.cxx >tl-main.cxx
$ g++-12 -o tl tl-main.cxx

This has been reproduced using Homebrew GCC 12.2.0 on MacOS 12.5-x86_64 with CLT 13.4.0.0.1.1651278267 and originally reported as Homebrew issue Homebrew/homebrew-core#110673

I can reproduce on aarch64-apple-darwin21 with g++ from Homebrew, which is gcc-12-2-darwin-pre-r1 in this branch.

$ g++-12 tl.cxx main.cxx -c
$ nm tl.o  
0000000000000020 s ___emutls_t.tl
0000000000000000 D ___emutls_v.tl
0000000000000000 t ltmp0
0000000000000000 d ltmp1
0000000000000020 s ltmp2
$ nm main.o 
0000000000000068 s EH_frame1
0000000000000030 T __Z5gt_tlv
                 U __ZTH2tl
0000000000000000 T __ZTW2tl
                 U ___emutls_get_address
                 U ___emutls_v.tl
0000000000000048 T _main
0000000000000000 t ltmp0
0000000000000068 s ltmp1

Our build is:

Configured with: ../configure --prefix=/opt/homebrew/opt/gcc --libdir=/opt/homebrew/opt/gcc/lib/gcc/current --disable-nls --enable-checking=release --with-gcc-major-version-only --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-12 --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 12.2.0' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --with-system-zlib --build=aarch64-apple-darwin21 --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk

I will launch a built “outside homebrew” and report back.

Edit: I cannot bootstrap anymore because of the linker bug in Xcode 14 (#6), so I have no way to confirm outside of homebrew right now. But I would be surprised if it was a problem on our side.

iains commented

sorry it took a while to get to this (last weekend was the GNU cauldron, and otherwise rather busy in the weeks).

The change that triggers this issue fixes a real bug in the handling of thread-local initialisers between different TUs.

Putting all the code in one TU will, indeed, "make the problem go away" because the cross-TU fix is no longer invoked.

I'd rather not back out the cross-TU thread local init fix, if possible, so need to explore how to fix the ld64 complaint ....

----- the actual problem:

AFAICT from initial analysis this is the (sometimes irritating) difference between the semantics of ELF and Mach-O weak references. The __ZTH2tl symbol is a weak reference (i.e. it might not be present or required at runtime and absence is not an error) - we check for its presence in the init code before we try to call it.

Actually, this all works fine on macOS at runtime .. the weak reference behaviour is fully supported.

The problem is at link time - where ld64 does not allow the weak reference to be undefined by default (where ELF is quite happy with this). The idea is that the linker adds information about where the weak symbol would be found if present (at runtime it not required to be present as noted above).

One work-around is to specify "-Wl,-undefined,dynamic_lookup" to the link line in cases like the one you cite .. but that is also irritating.

Now I need to go back to the test-cases that are motivation for the fix and see if there is some other way to make ld64 happy again.

iains commented

OK. So the analysis seems to be sound.

We have fixed a long-standing issue where external TLS variables that require dynamic initialisers would not have had those inits run in the case that the referring TU did not contain the init.

The solution to this is to arrange to call a weak reference function to do the init (since we cannot know in a use TU whether the init is static or dynamic).

Darwin's weak references have the twist noted above (it makes perfect sense when talking about some Framework that might be provided optionally, but it makes a lot less sense in the case like this - where we have some anonymous variable that is not present in the user's code).

Since we cannot change the underlying Mach-O behaviour, a work-around is called for.

I am testing a workaround that provides a weak dummy init function at the referencing end - this will satisfy the weak reference at link time if there is no NON-weak version. Therefore, for cases like the report above (where there is no dynamic init needed or created) the dummy will be used - which is a NOP.

If there is a dynamic init, then that has a non-weak definition which overrides the weak one provided locally (at static link time).

This workaround mechanism has been used several times before for Darwin - however, in this particular case it is much more invasive in the codebase. The patch I am testing would almost certainly not be acceptable upstream, so I will need to take advice on some way to encapsulate the target issues.

Will add a note here when the update is pushed.

iains commented

so this is fixed in the pre-release but I'll leave the issue open until 12.2 release is done.

iains commented

there is a fix now in the branch, but it needs more work before it can be applied upstream.