riscv-non-isa/riscv-elf-psabi-doc

Define Initial Exec to Local Exec TLS relaxation

MaskRay opened this issue · 10 comments

#94 tracks TLSDESC (replacement for traditional General Dynamic and Local Dynamic), while this issue tracks Initial Exec to Local Exec relaxation.

For external TLS variable accesses,

extern __thread int var;
int *addr() { return &var; }
int get() { return var; }

Currently gcc (clang is similar) emits:

# -fno-pic, Local Exec
# can cause copy relocation of TLS variables
addr:
        lui     a0,%tprel_hi(var)
        add     a0,a0,tp,%tprel_add(var)
        addi    a0,a0,%tprel_lo(var)
        ret
get:
        lui     a5,%tprel_hi(var)
        add     a5,a5,tp,%tprel_add(var)
        lw      a0,%tprel_lo(var)(a5)
        ret
# -fpie, Initial Exec
addr:
        la.tls.ie       a0,var
        add     a0,a0,tp
        ret
get:
        la.tls.ie       a5,var
        add     a5,a5,tp
        lw      a0,0(a5)
        ret

Emitting Local Exec code sequences in the position dependent (-fno-pic) code is really bad (it makes st_size and st_align part of the ABI. TLS copy relocation needs cooperation from the dynamic loader. According to Rich Felker, R_RISCV_COPY does not work on musl. I think the relevant source is ldso/dynlink.c case REL_COPY:. FreeBSD rtld (libexec/rtld-elf/riscv/reloc.c) do not seem to support it, either. glibc (sysdeps/riscv/dl-machine.h) supports it.

Fixed st_size in the executable means extended array lengths in the shared object does not work.). If we apply riscvarchive/riscv-gcc#118 , we will make the -fno-pic output consistent with the -fpie output.

For both -fno-pic and -fpie, there will be some overhead if the variable var turns out to be defined in the executable. Defining Initial Exec to Local Exec relaxation can eliminate the cost.

This is being discussed in an LLVM issue,
https://reviews.llvm.org/D70398
an FSF Binutils bug,
https://sourceware.org/bugzilla/show_bug.cgi?id=23825
a github riscv-gcc issue,
riscvarchive/riscv-gcc#118
and maybe in other places I haven't found yet.

I believe that this is the correct place to discuss this issue.

Dropping this feature means an ABI change, and we froze the ABI when we upstreamed glibc. However, since we are proposing to change the compiler, there won't be any backwards compatible problem as long as binutils and glibc continue to support the feature for old libraries compiled with the feature. This would mean marking the feature as deprecated instead of immediately dropping it from the ABI, and then sometime later marking it as obsolete. Thus compilers should stop generating it immediately, but linkers and dynamic linkers may need to continue to support it as long as we have old libraries using it.

There is support in glibc for TLS copy relocs, in the sysdeps/riscv/dl-machine.h file, there is a R_RISCV_COPY case, which has a comment
/* Handle TLS copy relocations. */
This does work for simple testcases, once I found and fixed the binutils bug that prevented it from working.

This feature was added to the ABI before I started doing RISC-V work, and I don't know why it is there. I agree that it is an odd feature, and one that no other architecture target supports. But I don't have an opinion about it, as this is outside my primary area of expertise, other than the fact that if we want to drop this we need to deprecate it first as we do have existing systems built using this feature. There do seem to be some good arguments presented for why it should be deprecated.

There are good reasons not to have this "feature" (TLS copy relocs). Copy relocation in general (not just TLS) are problematic in that they can significantly increase memory usage (for non-TLS globals, a worst-case involves copying a large const table from shared text in a shared library to writable data in each instance of an application using the library) and make size of structs/arrays into ABI between the library and application linked to it, thereby precluding extensibility by adding to the end of a struct or enlarging an array.

For TLS, the memory usage issue is magnified by the waste being per-thread rather than per-process (each thread now reserves space for two copies of the copy-reloc'd TLS). And whereas it's been long-known that exporting global structs/arrays from shared libraries could be problematic because of copy relocations, ones with thread-local storage have never before had this problem on any existing architecture. Programmers could rightly use them in the presumed-to-be-extensible way without ABI breakage. Doing this with copy relocations on riscv breaks that property.

Further, TLS with copy relocations clashes with protected-visibility on TLS data. For global data and functions, protected visibility already had poorly-defined semantics and was not reliable, due to copy relocations and PLT definitions. However for TLS, protected visibility presumably did work everywhere. Introducing copy relocations broke that.

I don't think removing copy relocations here is serious breakage. Tooling could keep support for existing programs with them, but it looks like, due to binutils bug 23825, affected binaries were broken anyway. Moreover musl has not supported copy relocations and I would like to not start. I've included the patch from PR 118 above in musl-cross-make for now and hope that will be the upstream decision.

@aswaterman @palmer-dabbelt Do you have any comments here? Do you know why these TLS copy relocs are present?

Yes, it was my apparently naive solution to make LE TLS relocs against extern symbols work. I didn't realize emitting IE relocs and relaxing them was a superior solution. The copy relocs seemed to work and produce decent code, so I never revisited it.

With the benefit of hindsight, I would've done things differently.

I think we should continue supporting the copy relocs in binutils and the dynamic linker for backwards compatibility, but change GCC to stop emitting them, and add IE -> LE relaxation in binutils.

I'm happy with that course of action. From my side (for musl) I'll just leave them unsupported since they have never worked in the past (no code in ldso to do them) and we now have the gcc patches in place to avoid it.

That sounds like a good plan for musl.

Looks like the binutils bug fix made it into 2.33, which means we can't just say "every binary with copy relocs is broken so we can remove it from the ABI". Given that, I think the best way forward here is to say the old scheme is deprecated. That'll allow us to avoid adding support to new projects (ie, musl) but we'll need to keep around support for anything that may have worked in the past (ie, glibc).

I wasn't aware that the fix in binutils was already working right, since the issue on the tracker is still open. Indeed it may need to be supported indefinitely into the future on glibc if that's the case. Can the fix on the GCC side (not emitting local-exec here) be fast-tracked to make it into the next GCC release so that we at least minimize the proliferation of the problem? It would be nice not to be worrying many years down the line that folks might still have riscv toolchains where TLS causes the copy-reloc-related ABI issues with libraries.

@jim-wilson knows better than I do about both of those, as he's the one who wrote the binutils patch and posted the WIP GCC branch. AFIAK we don't know of any brokenness with the binutils patch, but IIRC the goal was just to squash the glibc test suite failure so it's not like anyone's looked that closely.

The binutils bug report is still open because I think there is a better way to fix it that I haven't tried yet, which is stuck on my to do list with hundreds of other things. But it is working. It fixed 2 glibc testsuite failures, and is in a few distros already. And has been in riscv-gnu-toolchain for a while.

The gcc patch is trivial. We can add it anytime. I'll want to make sure there are no binutils/gcc/glibc testsuite regressions but that is just some cpu time for builds. My riscv machine is tied up at the moment tryinjg to reproduce Andreas Schwab's binutils bug report for a llvm builds.

I think the hard part now is that someone has to write a patch to fix the text in this document.