ClangBuiltLinux/linux

LTO crash related to fortify

nickdesaulniers opened this issue · 13 comments

https://github.com/ClangBuiltLinux/continuous-integration2/runs/3854281995?check_suite_focus=true

during LTO, we observe the following ICE:

Global is external, but doesn't have external or weak linkage!
i64 ()* @strlen.inline

cc @serge-sans-paille.

cc @vishalbhoj from the logs, I try the command to repro but observe:

$ tuxmake --target-arch=x86_64 --kconfig=gki_defconfig --toolchain=clang-nightly --wrapper=none --environment=KBUILD_BUILD_TIMESTAMP=@1633913936 --environment=KBUILD_BUILD_USER=tuxmake --environment=KBUILD_BUILD_HOST=tuxmake --runtime=podman --image=855116176053.dkr.ecr.us-east-1.amazonaws.com/tuxmake/x86_64_clang-nightly LLVM=1 LLVM_IAS=1 config default kernel modules
# to reproduce this build locally: tuxmake --target-arch=x86_64 --kconfig=gki_defconfig --toolchain=clang-nightly --wrapper=none --environment=KBUILD_BUILD_TIMESTAMP=@1633913936 --environment=KBUILD_BUILD_USER=tuxmake --environment=KBUILD_BUILD_HOST=tuxmake --runtime=podman --image=855116176053.dkr.ecr.us-east-1.amazonaws.com/tuxmake/x86_64_clang-nightly LLVM=1 LLVM_IAS=1 config default kernel modules
WARN[0000] The cgroupv2 manager is set to systemd but there is no systemd user session available 
WARN[0000] For using systemd, you may need to login using an user session 
WARN[0000] Alternatively, you can enable lingering with: `loginctl enable-linger 366559` (possibly as root) 
WARN[0000] Falling back to --cgroup-manager=cgroupfs    
ERRO[0000] cannot find UID/GID for user ndesaulniers: No subuid ranges found for user "ndesaulniers" in /etc/subuid - check rootless mode in man pages. 
WARN[0000] using rootless single mapping into the namespace. This might break some images. Check /etc/subuid and /etc/subgid for adding sub*ids 
WARN[0000] The cgroupv2 manager is set to systemd but there is no systemd user session available 
WARN[0000] For using systemd, you may need to login using an user session 
WARN[0000] Alternatively, you can enable lingering with: `loginctl enable-linger 366559` (possibly as root) 
WARN[0000] Falling back to --cgroup-manager=cgroupfs    
Trying to pull 855116176053.dkr.ecr.us-east-1.amazonaws.com/tuxmake/x86_64_clang-nightly:latest...
Error: initializing source docker://855116176053.dkr.ecr.us-east-1.amazonaws.com/tuxmake/x86_64_clang-nightly:latest: reading manifest latest in 855116176053.dkr.ecr.us-east-1.amazonaws.com/tuxmake/x86_64_clang-nightly: unauthorized: authentication required
E: Runtime preparation failed: failed to pull remote image 855116176053.dkr.ecr.us-east-1.amazonaws.com/tuxmake/x86_64_clang-nightly

hmm...couldn't reproduce locally

diff --git a/build.config.common b/build.config.common
index 08dd1323144a..e082326494d0 100644
--- a/build.config.common
+++ b/build.config.common
@@ -3,7 +3,7 @@ KMI_GENERATION=0
 
 LLVM=1
 DEPMOD=depmod
-CLANG_PREBUILT_BIN=prebuilts/clang/host/linux-x86/clang-r433403/bin
+CLANG_PREBUILT_BIN=:/android0/llvm-project/llvm/build/bin
 BUILDTOOLS_PREBUILT_BIN=build/build-tools/path/linux-x86
 DTC=${ROOT_DIR}/${BUILDTOOLS_PREBUILT_BIN}/dtc
$ BUILD_CONFIG=common/build.config.gki.x86_64 ./build/build.sh

perhaps the build of debian llvm is behind again.

reproducible at a4bccf7afdd0
not reproducible at 0f0e31cf511def3e92244e615b2646c1fd0df0cd

so the debian LLVM is out of date.

@nickdesaulniers Please use the tuxmake_reproducer.sh available in the build:
https://builds.tuxbuild.com/1zL4EdaZ4Z8tmA1mieSKQH1Re6m/tuxmake_reproducer.sh

The one in the logs uses a docker mirror hosted inside AWS and is not accessible to everyone.

thanks @vishalbhoj . @vishalbhoj also notes on IRC:

The clang version info is available in the metadata.json generated by tuxmake for example: https://builds.tuxbuild.com/1zL4EdaZ4Z8tmA1mieSKQH1Re6m/metadata.json . The version used is Debian clang version 14.0.0-++20211008071712+7aebdfc4fcc4-1exp120211008052520.214

Our CI is still seeing this with an updated Debian clang and I can reproduce it locally at llvm/llvm-project@5f668bb.

I was able to come up with the following reproducer with the help of cvise:

$ cat string.i
extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) unsigned long strlen(const char *s) { return 0; }
void strim_s() {
    const char *s;
    strlen(s);
}
unsigned long strlen(const char *s) { return 0; }

$ clang -flto -g -O2 -c -o string.o string.i

$ llvm-ar rcsTD string.a string.o

$ ld.lld -m elf_x86_64 -r -o vmlinux.o --whole-archive string.a
Global is external, but doesn't have external or weak linkage!
void ()* @strlen.inline
LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ld.lld -m elf_x86_64 -r -o vmlinux.o --whole-archive string.a
 #0 0x00000000015ed0c3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x15ed0c3)
 #1 0x00000000015eae7e llvm::sys::RunSignalHandlers() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x15eae7e)
 #2 0x00000000015ed6da SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fc1be146870 __restore_rt sigaction.c:0:0
 #4 0x00007fc1bda92d22 raise (/usr/lib/libc.so.6+0x3cd22)
 #5 0x00007fc1bda7c862 abort (/usr/lib/libc.so.6+0x26862)
 #6 0x000000000155eca8 (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x155eca8)
 #7 0x000000000155eac6 (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x155eac6)
 #8 0x0000000003782462 (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x3782462)
 #9 0x000000000221b92c llvm::lto::LTO::addRegularLTO(llvm::BitcodeModule, llvm::ArrayRef<llvm::lto::InputFile::Symbol>, llvm::lto::SymbolResolution const*&, llvm::lto::SymbolResolution const*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x221b92c)
#10 0x000000000221adca llvm::lto::LTO::addModule(llvm::lto::InputFile&, unsigned int, llvm::lto::SymbolResolution const*&, llvm::lto::SymbolResolution const*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x221adca)
#11 0x000000000221a8f3 llvm::lto::LTO::add(std::unique_ptr<llvm::lto::InputFile, std::default_delete<llvm::lto::InputFile> >, llvm::ArrayRef<llvm::lto::SymbolResolution>) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x221a8f3)
#12 0x0000000001790a5e lld::elf::BitcodeCompiler::add(lld::elf::BitcodeFile&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x1790a5e)
#13 0x00000000016eea7e void lld::elf::LinkerDriver::compileBitcodeFiles<llvm::object::ELFType<(llvm::support::endianness)1, true> >() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x16eea7e)
#14 0x00000000016d7615 void lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::support::endianness)1, true> >(llvm::opt::InputArgList&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x16d7615)
#15 0x00000000016c8d99 lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x16c8d99)
#16 0x00000000016c67d4 lld::elf::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x16c67d4)
#17 0x0000000001540a0f lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) lld.cpp:0:0
#18 0x00000000015402fd main (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x15402fd)
#19 0x00007fc1bda7db25 __libc_start_main (/usr/lib/libc.so.6+0x27b25)
#20 0x000000000153ffee _start (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/lld+0x153ffee)

slightly reduced command line:

$ clang -flto -g -O2 -c bar.c
$ ld.lld -r bar.o

Note that without -g, I get a different failed assertion:

ld.lld: ../lib/Linker/IRMover.cpp:1088: llvm::Error (anonymous namespace)::IRLinker::linkFunctionBody(llvm::Function &, llvm::Function &): Assertion `Dst.isDeclaration() && !Src.isDeclaration()' failed.
...
 #9 0x00000000035ca1ec (anonymous namespace)::IRLinker::linkGlobalValueBody(llvm::GlobalValue&, llvm::GlobalValue&) IRMover.cpp:0:0
#10 0x00000000035c97d1 (anonymous namespace)::IRLinker::materialize(llvm::Value*, bool) IRMover.cpp:0:0
#11 0x0000000003c5fefd (anonymous namespace)::Mapper::mapValue(llvm::Value const*) ValueMapper.cpp:0:0
#12 0x0000000003c611a1 (anonymous namespace)::Mapper::remapInstruction(llvm::Instruction*) ValueMapper.cpp:0:0
#13 0x0000000003c61a01 (anonymous namespace)::Mapper::remapFunction(llvm::Function&) ValueMapper.cpp:0:0
...

the generated lib/string.o in the kernel build doesn't validate with opt -verify:

extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) unsigned long strlen() {}

int strim_s() {
  long size = strlen(strim_s);
  !size;
}

unsigned long strlen() {}
$ clang -flto -O2 string.i -emit-llvm -S -o string.ll
$ opt -verify string.ll
opt: string.ll:20:9: error: invalid linkage for function declaration
declare internal fastcc i64 @strlen.inline() unnamed_addr #2
        ^

I think that second declaration is messing things up.

The kernel code is basically this: https://gist.github.com/nickdesaulniers/b69d319880b8a3b721740e5fc0785dcd and doesn't need LTO to reproduce any issues:

$ clang -O2 foo.c -c -o foo.o
Global is external, but doesn't have external or weak linkage!
i64 ()* @strlen.inline
fatal error: error in backend: Broken module found, compilation aborted!

Thanks for the reproducer. I'm on it.

@nickdesaulniers your test case is interesting, because inline contains two different definitions, (the fortified one, and the non fortified one). gcc always picks the non fortified one, and clang picks the fortified one at call site for strim but then doesn't generate its body.

@serge-sans-paille thanks for the patch: https://reviews.llvm.org/D112059. Reviewing/testing now.