NixOS/patchelf

patchelf breaks dylibs from recent Firefox Nightly builds

K900 opened this issue · 11 comments

K900 commented

Describe the bug

After using patchelf --set-rpath on a library from a recent Firefox Nightly build, the library can no longer be loaded because it segfaults the linker.

Steps To Reproduce

Expected behavior

No segfault.

patchelf --version output

Attempted both default nixpkgs 0.15.0 and current nixpkgs patchelfUnstable (c401289).

Additional context

This seems to have been caused by upstream enabling some kind of advanced linker wizardry called "relrhack": https://hg.mozilla.org/mozilla-central/rev/032b87ff55061bcbdc7a85d9e18fde814797073a

The last build before that commit works fine.

The problem is that the source code uses the _DYNAMIC symbol, which translates to the binary code accessing the .dynamic section at a fixed address. But patchelf moves it, and puts something else where it used to be, so the code reads garbage.

K900 commented

Well this is fun. So I guess we need a custom fixup for this...

Smaller (independent) reproducer:

#include <stdio.h>
#include <elf.h>

extern Elf64_Dyn _DYNAMIC[];

int main() {
	for (Elf64_Dyn* dyn = _DYNAMIC; dyn->d_tag != DT_NULL; dyn++) {
		printf("%lx %p\n", dyn->d_tag, dyn->d_un.d_ptr);
	}
	return 0;
}
  • compile with gcc -o test test.c
  • run ./test
  • patchelf --set-path foo test
  • run again. It will show garbage.
K900 commented

Yeah sounds like we just have to special case that symbol. Not that it's not already special cased by the linker...

The symbol is not used in Firefox's case. It uses the address directly.

Actually, even in the small reproducer, the symbol is not used at runtime.

K900 commented

So I guess we have two issues here - we still need to handle _DYNAMIC correctly AND we need to figure out what to do about Firefox...

Actually, removing https://github.com/NixOS/patchelf/blob/master/src/patchelf.cc#L674 makes it work, because patchelf doesn't actually put another section where .dynamic used to be. It only overwrites its content with garbage.

K900 commented

Oh, I actually thought we moved the sections properly and was going to try this as a workaround tomorrow.

What is the original reasoning behind overwriting the old sections with Zs? Just to reduce confusion?

The no-clobber workaround seems less than ideal, since it means that code referencing _DYNAMIC is using an old copy of the dynamic table, which is likely to be different from the dynamic table in the new PT_DYNAMIC segment.

The relrhack is in https://github.com/mozilla/gecko-dev/blob/58c532751054863dbb9d277051d63e1e7e77929e/build/unix/elfhack/inject.c#L184. This could be changed to use __ehdr_start and e_phoff to find the PT_DYNAMIC program header (the same function already does this to find PT_GNU_RELRO).