iains/gcc-12-branch

Possible regression between 12.1-r1 and 12.2-r0 on Intel

fxcoudert opened this issue · 17 comments

While testing 12.2-r0 at Homebrew (Homebrew/homebrew-core#108516), we are seeing the same failure on two different packages, specifically on Intel macOS (10.15, 11 and 12, so it's pretty consistent).

We take a binary (abyss-rresolver-short) compiled and linked against GCC 12.1-r1 and run it with the libraries from GCC 12.2-r0, and we get:

dyld: Symbol not found: ___emutls_get_address
    Referenced from: /usr/local/Cellar/abyss/2.3.5_1/bin/abyss-rresolver-short
    Expected in: /usr/local/opt/gcc/lib/gcc/current/libstdc++.6.dylib
   in /usr/local/Cellar/abyss/2.3.5_1/bin/abyss-rresolver-short

I'll dig into more later, try to see what symbols are present in the two versions, but I'm putting this here in the meantime, in case it rings a bell.

Could it be related to e5b0693 ?

iains commented

hmm .. that looks possibly like we had a bug in 12.1..
libstdc++.6.dylib should not be exporting ___emutls_get_address (that symbol is provided by libgcc_s.1.1.dylib)
so, libstdc++.dylib should have an "undefined" for it expecting to find it in libgcc_s.1.1.

I need to take a look at 12.1 and 12.2 libraries and see what's happened.
(and then figure out if there's a graceful way to recover)

iains commented

Could it be related to e5b0693 ?

I think libstdc++ already had dependencies on emuTLS (e.g. for the 00000000001b4f20 D ___emutls_v._ZSt11__once_call) so that I would not expect that change to break things but ICBW. Not likely to have any time to look at this today, unfortunately

iains commented

FAOD - are you saying that this is not seen on Arm64? (that seems quite unexpected, since there's nothing arch-specific here AFAIU .. unless it's a difference in dyld behaviour).

 Looking at the Homebrew bottles for 12.1, I see:

$ nm intel_monterey/gcc/12.1.0/lib/gcc/current/libstdc++.6.dylib|grep emutls
                 U ___emutls_get_address
                 I ___emutls_get_address (indirect for ___emutls_get_address)
                 U ___emutls_register_common
                 I ___emutls_register_common (indirect for ___emutls_register_common)
00000000001c6cc0 D ___emutls_v._ZSt11__once_call
00000000001c6ca0 D ___emutls_v._ZSt15__once_callable
$ nm arm_monterey/gcc/12.1.0/lib/gcc/current/libstdc++.6.dylib|grep emutls  
                 U ___emutls_get_address
                 I ___emutls_get_address (indirect for ___emutls_get_address)
                 U ___emutls_register_common
                 I ___emutls_register_common (indirect for ___emutls_register_common)
00000000001a6338 D ___emutls_v._ZSt11__once_call
00000000001a6318 D ___emutls_v._ZSt15__once_callable

So they look the same. But we're seeing the issue only on Intel, and not ARM. I have yet to figure out why. The binaries for abyss have:

$ nm abyss_intel/abyss/2.3.5_1/bin/abyss-rresolver-short | grep emutls
                 U ___emutls_get_address
000000010003c6c0 S ___emutls_t._ZZN6btllib9SeqReader19ready_records_arrayEvE3var
00000001000579a0 D ___emutls_v._ZGVZN6btllib9SeqReader19ready_records_arrayEvE3var
0000000100057980 D ___emutls_v._ZZN6btllib9SeqReader19ready_records_arrayEvE3var
00000001000579c0 D ___emutls_v._ZZN6btllib9SeqReader20ready_records_ownersEvE3var
$ nm abyss_arm/abyss/2.3.5_1/bin/abyss-rresolver-short | grep emutls
000000010002bd40 T ___emutls_get_address
000000010002bf34 T ___emutls_register_common
000000010003ae28 S ___emutls_t._ZZN6btllib9SeqReader19ready_records_arrayEvE3var
0000000100052920 D ___emutls_v._ZGVZN6btllib9SeqReader19ready_records_arrayEvE3var
0000000100052900 D ___emutls_v._ZZN6btllib9SeqReader19ready_records_arrayEvE3var
0000000100052940 D ___emutls_v._ZZN6btllib9SeqReader20ready_records_ownersEvE3var
000000010002bcb0 t _emutls_destroy
000000010002bd10 t _emutls_init
0000000100053c40 b _emutls_key
0000000100052a48 d _emutls_mutex
0000000100053c38 b _emutls_size

so they have a clear difference between Intel and ARM. No sure how that came out to be, since they were rebuilt at the same time, using the same GCC. Weird…

Further edit: OK another difference is that the ARM binary for abyss is not linking to libgcc_s, for some reason:

$ otool -L abyss_intel/abyss/2.3.5_1/bin/abyss-rresolver-short
abyss_intel/abyss/2.3.5_1/bin/abyss-rresolver-short:
	@@HOMEBREW_PREFIX@@/opt/gcc/lib/gcc/current/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.30.0)
	@@HOMEBREW_PREFIX@@/opt/gcc/lib/gcc/current/libgomp.1.dylib (compatibility version 2.0.0, current version 2.0.0)
	@@HOMEBREW_PREFIX@@/opt/gcc/lib/gcc/current/libgcc_s.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
$ otool -L abyss_arm/abyss/2.3.5_1/bin/abyss-rresolver-short
abyss_arm/abyss/2.3.5_1/bin/abyss-rresolver-short:
	@@HOMEBREW_PREFIX@@/opt/gcc/lib/gcc/current/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.30.0)
	@@HOMEBREW_PREFIX@@/opt/gcc/lib/gcc/current/libgomp.1.dylib (compatibility version 2.0.0, current version 2.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
iains commented

there's a bunch of stuff to take in here.

  1. I ___emutls_get_address (indirect for ___emutls_get_address) that seems to indicate that the symbol is being re-exported from libstdc++ .. which I would not have expected ..

  2. the ___emutls_get_address stuff is weak - so that either the exe should provide a concrete instance (which arm is doing) or it should be found from libgcc_s.1.1.dylib .. which is what I'd expect if the shared lib is available;
    -- OTOH, with rpaths switched off ... then the shared lib cannot be used on 10.11+ .. because the build will fail if it is...

having said that, the specs that handle this are in arch-independent code, which implies that the difference is in the way that ld64 is dealing with the weak symbols maybe (was the Xcode toolset version the same in both cases?)

humph .. what this is telling me is that I don't even know where the problem lies .. but I do know (at least expect) that the same process is applied to all archs so differences in this could be explained by differences in tools or configuration.

iains commented

there is a clear difference between 12.1Dr1 and 12.2Dr0 libstdc++ where the former appears to be (I would have said incorrectly) re-exporting ___emutls_xxxx

edit:
However (1) both 12.1 and 12.2 libraries have the same library deps, the difference seems to be that 12.1 is re-exporting and 12.2 is not.
However (2) there is no actual library re-export line, so the re-exporting is from the symbol table...

I will have to look at the changes to libstdc++ between the two revisions; I have not (intentionally) made any change to the runtimes in 12.2 other than eliminating the build-time run paths... so either a merge error or some upstream change would seem likely ..

OTOH the 12.2 output seems correct to me .. libstdc++ has no business re-exporting symbols from libgcc_....

edit 2: And I cannot be sure that the same version of ld64 was used to build both libraries, there's been Xcode updates in the interval .. sigh, some rebuilding might be needed.

iains commented

This release fixed a bug in the handling of lib tool options - and effects the cases that we are using embedded runpaths (macOS 10.11+) - the same fix also fixed the handling of 'nodefaultexport' which was intended to suppress the export of emulated TLS symbols when a shared libgcc is in use. So the bug fix does change the exports.

So the long explanation is this:

When we are on macOS 10.11+ and we cannot use a shared libgcc_s (because of the DYLD_LIBRARY… shenanigans) we incorporate a weak version of the emulated tls code in each shared object that needs it.

Those weak versions must be exported - so that dyld can choose just one instance at load time.

When we have @rpaths, we no longer have this constraint - since we can now build and use a shared libgcc_s … and it is not necessary for other shared objects to each have a (wasted) copy of the emutls code.

The intent was that (under the second circumstance) there would be no exported emutls symbols from any other shared object than libgcc_s …. but, as noted above, the libtool invocations were buggy and we have been unintentionally re-exporting those symbols.

Now, I think that everything should still work in the case that we have a mixture of uses of shared libgcc_s and hard-linked emulated TLS code, since the instances are all still weak - thus dyld should resolve just one at load time (most likely to the libgcc_s one, but that can depend on the order in which DSOs are loaded).

So .. probably the simplest way forward is to revert the part of the lib tool fix that fixes ’nodefaultexports’ at least on 12.x .. we can think about whether this is a problem enough to bump the SO version for libstdc++ in gcc-13.

Not related to this issue (I can open a separate one if merited), but, speaking of @rpaths, is there any reason why libgcc_s.1 does not have any LC_RPATH commands, despite linking to @rpath/libgcc_s.1.dylib?

❯ otool -L libgcc_s.1.dylib
libgcc_s.1.dylib:
        /usr/local/opt/gcc/lib/gcc/current/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.1.0)
        @rpath/libgcc_s.1.1.dylib (compatibility version 1.0.0, current version 1.1.0, reexport)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
❯ otool -l libgcc_s.1.dylib | rg -A2 LC_RPATH || echo no rpaths
no rpaths

This doesn't seem to be the case for the other libraries, for example:

❯ otool -L libgfortran.dylib
libgfortran.dylib:
        /usr/local/opt/gcc/lib/gcc/current/libgfortran.5.dylib (compatibility version 6.0.0, current version 6.0.0)
        @rpath/libquadmath.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        @rpath/libgcc_s.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
❯ otool -l libgfortran.dylib | rg -A2 LC_RPATH
          cmd LC_RPATH
      cmdsize 32
         path @loader_path/ (offset 12)
--
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
iains commented

Not related to this issue (I can open a separate one if merited), but, speaking of @rpaths, is there any reason why libgcc_s.1 does not have any LC_RPATH commands, despite linking to @rpath/libgcc_s.1.dylib?

libgcc_s.1.dylib is a backwards-compatibility library, that should be only found by exes previously linked using an older version of GCC that used the libgcc_ext stub library to export symbols from the compiler's libgcc_s. [it is somewhat involved, but that's the summary].

libgcc_s.1.dylib forwards libgcc_s.1.1.dylib (and symbols from either libSystem or /usr/lib/libgcc_s.1.dylib on Darwin8, 9 and 10).

Have you any evidence of a problem? (the exe must already have an rpath for the directory that contains libgcc_s.1.dylib to have found it, I wonder if we need a duplicate.

OTOH, I have no problem to add @loader_path there for the sake of absolute safety.

iains commented

hmmm actually in my build there is

$ otool -lv gcc/libgcc_s.1.dylib |grep -A2 LC_

<snip>

--
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
--

So I wonder what's happening there.

I can now confirm that the 12.2-pre-r1 fixes the issues previously reported for ___emutls_get_address. Thanks @iains

Have you any evidence of a problem?

We ran into one when building GCC 12 for Homebrew, but it's also a product of the interaction between an application that does linkage a bit unconventionally and some post-build packaging we do at Homebrew.

More specifically, we did have one application (Julia) that still tried to go looking for libgcc_s.1 even when rebuilt with GCC 12, resulting in an error:

  ERROR: Unable to load dependent library /private/tmp/julia-20220802-62326-1frzjwy/julia-1.7.3/usr/lib/julia/libgcc_s.1.dylib
  Message:dlopen(/private/tmp/julia-20220802-62326-1frzjwy/julia-1.7.3/usr/lib/julia/libgcc_s.1.dylib, 0x000A): Library not loaded: @rpath/libgcc_s.1.1.dylib
    Referenced from: /usr/local/Cellar/gcc/12.1.0/lib/gcc/current/libgcc_s.1.dylib
    Reason: tried: '/private/tmp/julia-20220802-62326-1frzjwy/julia-1.7.3/usr/lib/libgcc_s.1.1.dylib' (no such file), '/private/tmp/julia-20220802-62326-1frzjwy/julia-1.7.3/usr/bin/../lib/libgcc_s.1.1.dylib' (no such file), '/usr/local/lib/libgcc_s.1.1.dylib' (no such file), '/usr/lib/libgcc_s.1.1.dylib' (no such file)

Details here: Homebrew/homebrew-core#106755 (comment). I did try to add @loader_path there, but @fxcoudert was reluctant to tamper with the build configuration.

(the exe must already have an rpath for the directory that contains libgcc_s.1.dylib to have found it, I wonder if we need a duplicate.

So, that's actually not the case for us, because of the post-build packaging that I mentioned above, which involves rewriting install names to absolute paths.

So you could end up linking to libgcc_s.1.dylib but not have the right LC_RPATH command.

But given that your build seems to have it, this may just be a packaging issue for Homebrew.

iains commented

I will double-check with the 12.2r1 changes, but now I think more about this, ISTR that this was fixed between 12.1r0 and 12.1r1 .. feel free to open an new Issue (we've already clouded this one enough) ..

No need; this seems to not be there anymore in 12.2-r0, which is good enough for us. Thanks! (Apologies; I should've checked our build of that first before making noise here.)

Testing of 12.2-pre-r1 has concluded, and we shipped it in Homebrew (Homebrew/homebrew-core#108516). We have not seen any other issue.

iains commented

this is fixed, but I will leave the report open until the 12.2 release is actually made.