android/ndk

clang version string harms reproducibility with meta-compile information

Opened this issue · 3 comments

Description

The Android NDK includes a CLANG_VENDOR string that provides some extra details of the compiler. As of r26, this also includes some options used to (as far as I can tell) compile the compiler itself--including +/- things like bolt and lto--rather than features that modify the compiler.

https://android.googlesource.com/toolchain/llvm_android/+/31a1d3747b77b10185c0adf03ae6036b474719c7/do_build.py#1123

The result is that identically-functioning copies of the compiler compiled for differing host operating systems are now getting different vendor strings even though they function identically, making it more difficult to obtain bit-for-bit reproductions using the Android NDK compiler stack.

Is there a reason why the compiler version string needs to be annotated with this information? I'd have thought that it would only be interesting to annotate things which would result in a different behavior, but these are just options used to affect the performance of the compiler itself...

(FWIW, I am passing -fno-ident, but for some reason if you compile a .s file--as is done by OpenSSL--it passes -dwarf-debug-producer to -cc1as with the full clang version, and I can't figure out any way to disable this from ending up in the .debug_info section without fully disabling -g.)

# cat a.s
.text

.cfi_startproc
        adcl    $0,%edi
.cfi_endproc
# /usr/local/lib/android/sdk/ndk/27.0.12077973/toolchains/llvm/prebuilt/linux-x86_64/bin/clang -gfull -fno-ident -Os -c -o a.o a.s -v
Android (12027248, +pgo, +bolt, +lto, +mlgo, based on r522817) clang version 18.0.1 (https://android.googlesource.com/toolchain/llvm-project d8003a456d14a3deb8054cdaa529ffbf02d9b262)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/lib/android/sdk/ndk/27.0.12077973/toolchains/llvm/prebuilt/linux-x86_64/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/13
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/14
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/14
Candidate multilib: .;@m64
Selected multilib: .;@m64
clang: warning: argument unused during compilation: '-fno-ident' [-Wunused-command-line-argument]
 (in-process)
 "/home/saurik/android/sdk/ndk/27.0.12077973/toolchains/llvm/prebuilt/linux-x86_64/bin/clang" -cc1as -triple x86_64-unknown-linux-gnu -filetype obj -main-file-name a.s -target-cpu x86-64 -fdebug-compilation-dir=/home/saurik/llvm_android -dwarf-debug-producer "Android (12027248, +pgo, +bolt, +lto, +mlgo, based on r522817) clang version 18.0.1 (https://android.googlesource.com/toolchain/llvm-project d8003a456d14a3deb8054cdaa529ffbf02d9b262)" -debug-info-kind=constructor -dwarf-version=5 -mrelocation-model pic -object-file-name=/home/saurik/llvm_android/a.o -o a.o a.s
# strings a.o
/home/saurik/llvm_android
Android (12027248, +pgo, +bolt, +lto, +mlgo, based on r522817) clang version 18.0.1 (https://android.googlesource.com/toolchain/llvm-project d8003a456d14a3deb8054cdaa529ffbf02d9b262)
/home/saurik/llvm_android
.debug_abbrev
.text
.rela.debug_aranges
.debug_line_str
.rela.debug_info
.rela.debug_line
.rela.eh_frame
.strtab
.symtab

Upstream bug

No response

Commit to cherry-pick

No response

Affected versions

r27

Canary version

No response

Host OS

Linux, Mac

Host OS version

Debian sid, macOS 14.4.1

Affected ABIs

arm64-v8a, x86_64

For example, all of the differences between my Linux and macOS (host; same target) builds of the same code differ by nothing other than these embedded flags.

-rw-r--r-- 1 root root  25064 Nov 10 22:17 x86_64/openssl/crypto/aes/aesni-sha256-x86_64.o
-rw-r--r-- 1  501 staff 25064 Nov 10 19:47 x86_64/openssl/crypto/aes/aesni-sha256-x86_64.o
1138,1139c1138,1139
< 00004710: 3732 3438 2c20 2b70 676f 2c20 2b62 6f6c  7248, +pgo, +bol
< 00004720: 742c 202b 6c74 6f2c 202b 6d6c 676f 2c20  t, +lto, +mlgo, 
---
> 00004710: 3732 3438 2c20 2b70 676f 2c20 2d62 6f6c  7248, +pgo, -bol
> 00004720: 742c 202b 6c74 6f2c 202d 6d6c 676f 2c20  t, +lto, -mlgo, 

This is mainly for book keeping to ensure the optimizations are not silently dropped. @kongy - can we track these separately from the version string?