google/pprof

Symbolization seems to fail with buildID mismatch error when a profiling target application is run by invoking a linker directly

Closed this issue · 1 comments

What version of pprof are you using?

813a5fb

What operating system and processor architecture are you using?

CPU: amd64
Host OS: Linux 50e76dec012c 5.15.153.1-microsoft-standard-WSL2
And running images like debian:11, rockylinux:8. Details are explained below.

What did you do?

Our app is written in Rust and we're using rust-jemalloc-pprof to get a heap profile in pprof format. The pprof data is not symbolized so we're providing the exact same binary and *so files to pprof and trying to get a symbolized profiling data but failing.

The root cause seems to be that the app is run by invoking a linker directly like entrypoint: ["./dylib/ld-linux-x86-64.so.2", "--library-path", "./dylib", "./rust-pprof-test"]. The reason for doing this is that, due to some internal constraints, we build the Rust app in debian:11 and copy the binary and all the dependencies to rocky:8 and run the app directly specifying the linker and dependencies. I understand that this is a very irregular situation, but is there any way to obtain symbolized profiling data in such a scenario?

Steps to reproduce:

  1. Clone this repo https://github.com/ykadowak/rust-pprof-test on an Linux/amd64 machine
  2. Run these commands in the root directory of the repository:
docker compose up -d

docker compose exec rocky /bin/bash
# inside rockylinux:8 image
curl localhost:3000/debug/pprof/heap > heap.pb.gz
pprof ./heap.pb.gz # cannot symbolize
# you can also try things like
PPROF_BINARY_PATH=./dylib PPROF_TOOLS=./binutils pprof ./rust-pprof-test ./heap.pb.gz
exit

docker compose exec debian /bin/bash
# inside debian:11 image
curl localhost:3000/debug/pprof/heap > heap.pb.gz
pprof ./heap.pb.gz # cannot symbolize
exit

# Uncomment this line: https://github.com/ykadowak/rust-pprof-test/blob/f535eff2268dbf23fa304f851c95c4e275627386/compose.yaml#L16 to run the app directly.
docker compose stop
docker compose up -d

docker compose exec debian /bin/bash
# inside debian:11 image
curl localhost:3000/debug/pprof/heap > heap.pb.gz
pprof ./heap.pb.gz # can symbolize this time

What did you expect to see?

Symbolized result shows up.

What did you see instead?

When running the app by invoking a linker directly, a build ID mismatch error occurs like below.

[root@33b7bd08a8e5 app]# pprof -raw ./heap.pb.gz 
Local symbolization failed for ld-linux-x86-64.so.2 (build ID 7914137f6c04cbb6c7ec4ecb6295b5462c4a6c65): build ID mismatch
Comment: executableInfo=3;0;0
Comment: executableInfo=3;1c000;1c000
Comment: executableInfo=3;1c2000;1c2000
Comment: executableInfo=3;2280b8;2290b8
Comment: executableInfo=3;0;0
Comment: executableInfo=3;0;0
Comment: executableInfo=3;3000;3000
Comment: executableInfo=3;14000;14000
Comment: executableInfo=3;17dc8;18dc8
Comment: executableInfo=3;0;0
Comment: executableInfo=3;6000;6000
Comment: executableInfo=3;16000;16000
Comment: executableInfo=3;1bc08;1cc08
Comment: executableInfo=3;0;0
Comment: executableInfo=3;d000;d000
Comment: executableInfo=3;a7000;a7000
Comment: executableInfo=3;141d80;142d80
Comment: executableInfo=3;0;0
Comment: executableInfo=3;1000;1000
Comment: executableInfo=3;3000;3000
Comment: executableInfo=3;3d70;4d70
Comment: executableInfo=3;0;0
Comment: executableInfo=3;22000;22000
Comment: executableInfo=3;17b000;17b000
Comment: executableInfo=3;1c9768;1ca768
Comment: executableInfo=3;0;0
Comment: executableInfo=3;1000;1000
Comment: executableInfo=3;21000;21000
Comment: executableInfo=3;294c0;2a4c0
PeriodType: space bytes
Period: 0
Time: 2024-08-25 15:45:13.383007519 +0000 UTC
Samples:
inuse_space/bytes
    4195715: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Locations
     1: 0x7f48fb4b86ac M=2 
     2: 0x7f48fb4b8bbc M=2 
     3: 0x7f48fb4aa394 M=2 
     4: 0x7f48fb43752f M=2 
     5: 0x7f48fb4051aa M=2 
     6: 0x7f48fb405483 M=2 
     7: 0x7f48fb3f006a M=2 
     8: 0x7f48fb3ecbd3 M=2 
     9: 0x7f48fb3f163a M=2 
    10: 0x7f48fb3f8d02 M=2 
    11: 0x7f48fb3f5f88 M=2 
    12: 0x7f48fb53284c M=2 
    13: 0x7f48fb3f181b M=2 
    14: 0x7f48fb06dd09 M=23 __libc_start_main ??:?:0:0 s=0
    15: 0x7f48fb3dfec9 M=2 
Mappings
1: 0x7f48fb3a6000/0x7f48fb3c17d0/0x0 /app/dylib/ld-linux-x86-64.so.2 7914137f6c04cbb6c7ec4ecb6295b5462c4a6c65 
2: 0x7f48fb3c2000/0x7f48fb5674f9/0x1c000 /app/dylib/ld-linux-x86-64.so.2 7914137f6c04cbb6c7ec4ecb6295b5462c4a6c65 
3: 0x7f48fb568000/0x7f48fb5cd8f0/0x1c2000 /app/dylib/ld-linux-x86-64.so.2 7914137f6c04cbb6c7ec4ecb6295b5462c4a6c65 
4: 0x7f48fb5cf0b8/0x7f48fb8019e0/0x2280b8 /app/dylib/ld-linux-x86-64.so.2 7914137f6c04cbb6c7ec4ecb6295b5462c4a6c65 
5: 0x7ffd35996000/0x7ffd35996ce5/0x0 linux-vdso.so.1 f4c596200ed8d0e245960ef7d54a281f43f20530 
6: 0x7f48fb38a000/0x7f48fb38c898/0x0 ./dylib/libgcc_s.so.1 596409bc4e94583ef18f141c9b941a46540868ee 
7: 0x7f48fb38d000/0x7f48fb39db69/0x3000 ./dylib/libgcc_s.so.1 596409bc4e94583ef18f141c9b941a46540868ee 
8: 0x7f48fb39e000/0x7f48fb3a131c/0x14000 ./dylib/libgcc_s.so.1 596409bc4e94583ef18f141c9b941a46540868ee 
9: 0x7f48fb3a2dc8/0x7f48fb3a3448/0x17dc8 ./dylib/libgcc_s.so.1 596409bc4e94583ef18f141c9b941a46540868ee 
10: 0x7f48fb368000/0x7f48fb36d9e8/0x0 ./dylib/libpthread.so.0 255e355c207aba91a59ae1f808e3b4da443abf0c 
11: 0x7f48fb36e000/0x7f48fb37d0ad/0x6000 ./dylib/libpthread.so.0 255e355c207aba91a59ae1f808e3b4da443abf0c 
12: 0x7f48fb37e000/0x7f48fb3837d4/0x16000 ./dylib/libpthread.so.0 255e355c207aba91a59ae1f808e3b4da443abf0c 
13: 0x7f48fb384c08/0x7f48fb389470/0x1bc08 ./dylib/libpthread.so.0 255e355c207aba91a59ae1f808e3b4da443abf0c 
14: 0x7f48fb224000/0x7f48fb230278/0x0 ./dylib/libm.so.6 1d6ff6c4c69f3572486bc27b8290ee932b0b9f39 
15: 0x7f48fb231000/0x7f48fb2caca1/0xd000 ./dylib/libm.so.6 1d6ff6c4c69f3572486bc27b8290ee932b0b9f39 
16: 0x7f48fb2cb000/0x7f48fb3652c4/0xa7000 ./dylib/libm.so.6 1d6ff6c4c69f3572486bc27b8290ee932b0b9f39 
17: 0x7f48fb366d80/0x7f48fb367110/0x141d80 ./dylib/libm.so.6 1d6ff6c4c69f3572486bc27b8290ee932b0b9f39 
18: 0x7f48fb21e000/0x7f48fb21edb8/0x0 ./dylib/libdl.so.2 46b3bf3f9b9eb092a5c0cf5575e89092f768054c 
19: 0x7f48fb21f000/0x7f48fb220051/0x1000 ./dylib/libdl.so.2 46b3bf3f9b9eb092a5c0cf5575e89092f768054c 
20: 0x7f48fb221000/0x7f48fb2216e8/0x3000 ./dylib/libdl.so.2 46b3bf3f9b9eb092a5c0cf5575e89092f768054c 
21: 0x7f48fb222d70/0x7f48fb223110/0x3d70 ./dylib/libdl.so.2 46b3bf3f9b9eb092a5c0cf5575e89092f768054c 
22: 0x7f48fb04a000/0x7f48fb06b488/0x0 ./dylib/libc.so.6 2b86a1968781038c0766b17c1ea11a2a71d7d907 
23: 0x7f48fb06c000/0x7f48fb1c4ecc/0x22000 ./dylib/libc.so.6 2b86a1968781038c0766b17c1ea11a2a71d7d907 [FN][FL][IN]
24: 0x7f48fb1c5000/0x7f48fb2135f4/0x17b000 ./dylib/libc.so.6 2b86a1968781038c0766b17c1ea11a2a71d7d907 
25: 0x7f48fb214768/0x7f48fb21d680/0x1c9768 ./dylib/libc.so.6 2b86a1968781038c0766b17c1ea11a2a71d7d907 
26: 0x7f48fb802000/0x7f48fb802f68/0x0 ./dylib/ld-linux-x86-64.so.2 1b3277a419c3fa42b199e5a170ea215b32689793 
27: 0x7f48fb803000/0x7f48fb8222d0/0x1000 ./dylib/ld-linux-x86-64.so.2 1b3277a419c3fa42b199e5a170ea215b32689793 
28: 0x7f48fb823000/0x7f48fb82aca4/0x21000 ./dylib/ld-linux-x86-64.so.2 1b3277a419c3fa42b199e5a170ea215b32689793 
29: 0x7f48fb82c4c0/0x7f48fb82e178/0x294c0 ./dylib/ld-linux-x86-64.so.2 1b3277a419c3fa42b199e5a170ea215b32689793 

After more research, it seems that the root cause is rust-jemalloc-pprof wrongly constructs the Mapping field in our use case. Sorry for the fuss.