[bug] apply_reg_state may crash on entering and exiting stack frames on Aarch64
gdh1995 opened this issue · 3 comments
Environment
- libunwind version: master (2024-07-23 14:17 e3f88e8)
- architecture: Aarch64 - ARMv8 Processor rev 1 (v8l)
- OS: Nvidia Orin Drive system - Ubuntu 20.04.5 LTS
- crossing-compiled using
clang 8.0.1-9
fromx86_64-pc-linux-gnu
platform - here's my
config.h
: I disabledlibgcc
mode and enabledUNW_LOCAL_ONLY
. libunwind_config.zip
Description
- In
src/dwarf/Gparser.c
,apply_reg_state
callsdwarf_get (c, c->loc[rs->ret_addr_column], &ip);
. - But in some rare cases
c->loc[rs->ret_addr_column]
may be wrong and point to unmapped memory, like0x18
or0xd555...555d
- Then it can crash on dereferencing the pointer.
Reproduction Steps
Background: I want to use cpu-profiler in gperftools 2.15 to collect CPU performance data, but it almost always crashes at apply_reg_state
.
During my tests, libunwind
often fails to get a correct "next CFA / next IP" in its apply_reg_state
, especially for the below function (see the snapshot below - sorry I can not provide a minimized reproducer):
malloc
is a customized wrapper which callsjemalloc
and and replaces the default one inglibc
libunwind
gets wrong CFA / IP ifSIGPERF
hits0xa163194
,0xa163198
,0xa16319c
and0xa163250
- once a
CFA
is wrong, it often reports more wrong IPs on furtherunw_step
, or just reports a same IP and then stops with some errors.
- once a
- sometimes
libunwind
fails on triggering SIGPERF at a few other functions- their IP can be something like
0xffff993160ac
- usually the
malloc
function are called fromoperator new ()
at an address of0xffff9f3eab4c
- sorry but I don't know how to dump ASM code of such sections
- their IP can be something like
- FYI, during the test above,
0xffff9f58f7c0
is the IP of asigreturn
frame
I've noticed that all the IPs of my customized malloc
share a same keypoint - their LR
, SP
and stack values do not "match".
But I know little about DWARF, and although I spent tens of hours on src/dwarf/
I still failed when tried to understand what find_reg_state
does.
My tests and logs
I compiled libunwind
by myself, so that I may add syscall(SYS_write)
to log something and debug it. Here is the crashing scene:
[gperf] ==== start get stack trace ====
[step] input: child-cfa=0xfffde41c31d0 , ip=0x4cc0c7c , ip.is_signal_frame=0
- cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0xfffde41c3dd0) + 16
> 0xfffde41c31d8 [30] = 0xfffde41c3b58
- [30]: where[]=CFAREL ; 0xfffde41c3dd8 == (new-cfa=0xfffde41c3de0) + -8, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffde41c3dd8
[step] input: child-cfa=0xfffde41c3de0 , ip=0x4cc05c8 , ip.is_signal_frame=0
- cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0xfffde41c3e10) + 16
> 0xfffde41c31d8 [30] = 0xfffde41c3dd8
- [30]: where[]=CFAREL ; 0xfffde41c3e18 == (new-cfa=0xfffde41c3e20) + -8, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffde41c3e18
[step] input: child-cfa=0xfffde41c3e20 , ip=0x4cbf50c , ip.is_signal_frame=0
- cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0xfffde41c4740) + 16
> 0xfffde41c31d8 [30] = 0xfffde41c3e18
- [30]: where[]=CFAREL ; 0xfffde41c4748 == (new-cfa=0xfffde41c4750) + -8, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffde41c4748
[step] input: child-cfa=0xfffde41c4750 , ip=0x4cbc494 , ip.is_signal_frame=0
- cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0xfffde41c4790) + 16
> 0xfffde41c31d8 [30] = 0xfffde41c4748
- [30]: where[]=CFAREL ; 0xfffde41c4798 == (new-cfa=0xfffde41c47a0) + -8, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffde41c4798
[step] input: child-cfa=0xfffde41c47a0 , ip=0xffff9f58f7c0 , ip.is_signal_frame=1
[step] input: child-cfa=0xfffde41c5a00 , ip=0xa16319c , ip.is_signal_frame=0
- cfa @ regnum := reg.val=31 , regnum=31 , ret=0 , (cfa=0xfffde41c5a00) + 0
> 0xfffde41c31d8 [30] = 0xfffde41c49c8
- [30]: where[]=1 ; 0xfffde41c49c8 == (new-cfa=0xfffde41c5a00) + 0, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffde41c49c8
[step] input: child-cfa=0xfffde41c5a00 , ip=0xffff9f3eab4c , ip.is_signal_frame=0
- cfa @ sp := (c->cfa=0xfffde41c5a00) + 32
> 0xfffde41c31d8 [30] = 0xfffde41c49c8
- [30]: where[]=CFAREL ; 0xfffde41c5a08 == (new-cfa=0xfffde41c5a20) + -24, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffde41c5a08
[step] input: child-cfa=0xfffde41c5a20 , ip=0xa144544 , ip.is_signal_frame=0
- cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0x8) + 16
> 0xfffde41c31d8 [30] = 0xfffde41c5a08
- [30]: where[]=CFAREL ; 0x10 == (new-cfa=0x18) + -8, cfa_type=3
<< ip . operator& () = loc[30] = 0x10
*** SIGSEGV received at time=1722648362 on cpu 2 ***
PC: @ 0x4cc6c38 (unknown) apply_reg_state
@ 0x9f5b404 208 absl::lts_20230125::WriteFailureInfo()
@ 0x9f5b118 64 absl::lts_20230125::AbslFailureSignalHandler()
@ 0xffff9f58f7c0 5952 (unknown)
@ 0x4cc65f4 2048 _ULaarch64_dwarf_step
@ 0x4cc5b78 384 _ULaarch64_step
@ 0x4cc0ce8 3088 GetStackTraceWithContext_libunwind()
@ 0x4cc05c8 64 GetStackTraceWithContext()
@ 0x4cbf50c 2352 CpuProfiler::prof_handler()
@ 0x4cbc494 80 ProfileHandler::SignalHandler()
@ 0xffff9f58f7c0 4704 (unknown)
@ 0xffff9f3eab4c 64 operator new()
@ 0xa02acb4 48 google::protobuf::Arena::CreateMaybeMessage<>()
@ 0x5b644ec 64 google::protobuf::internal::RepeatedPtrFieldBase::MergeFromInnerLoop<>()
Explain:
Hello, today I recorded the CIE、FDE blocks and Debug logs during find_reg_state
:
# CIE (3 bytes)
0C 1F 00
# FDE (15 bytes)
50, 0C 1D 10, 9E 02, 9D 04, 93 06, 94 08, 95 0C, 00
The below is from a crash (SIGPERF on offset 4):
>get_rs_cache: acquiring lock
>run_cfi_program: run cfi: start-ip = 0x0 , end-ip = 0xffffffffffffffff , start-addr = 0x4974f6d , end-addr = 0x4974f70
>run_cfi_program: before reading op: ip = 0x0 , addr = 0x4974f6d
>run_cfi_program: after reading op: ip = 0x0, addr = 0x4974f6e, op = 12
>run_cfi_program: CFA_def_cfa r31+0x0
>run_cfi_program: run cfi: start-ip = 0xa172c50 , end-ip = 0xa172c54 , start-addr = 0x4c88859 , end-addr = 0x4c88868
>run_cfi_program: before reading op: ip = 0xa172c50 , addr = 0x4c88859
>run_cfi_program: after reading op: ip = 0xa172c50, addr = 0x4c8885a, op = 80
>run_cfi_program: CFA_advance_loc to 0xa172c60
>put_rs_cache: unmasking signals/interrupts and releasing lock
The below is from a succeeding SIGPERF on malloc
:
[step] input: child-cfa=0xfffdd5bc4470 , ip=0xffff96edc7c0 , ip.is_signal_frame=1
[step] input: child-cfa=0xfffdd5bc56d0 , ip=0xa172c74 , ip.is_signal_frame=0
>get_rs_cache: acquiring lock
>run_cfi_program: run cfi: start-ip = 0x0 , end-ip = 0xffffffffffffffff , start-addr = 0x4974f6d , end-addr = 0x4974f70
>run_cfi_program: before reading op: ip = 0x0 , addr = 0x4974f6d
>run_cfi_program: after reading op: ip = 0x0, addr = 0x4974f6e, op = 12
>run_cfi_program: CFA_def_cfa r31+0x0
>run_cfi_program: run cfi: start-ip = 0xa172c50 , end-ip = 0xa172c74 , start-addr = 0x4c88859 , end-addr = 0x4c88868
>run_cfi_program: before reading op: ip = 0xa172c50 , addr = 0x4c88859
>run_cfi_program: after reading op: ip = 0xa172c50, addr = 0x4c8885a, op = 80
>run_cfi_program: CFA_advance_loc to 0xa172c60
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c8885a
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c8885b, op = 12
>run_cfi_program: CFA_def_cfa r29+0x10
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c8885d
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c8885e, op = 158
>run_cfi_program: CFA_offset r30 at cfa+0xfffffffffffffff8
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c8885f
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c88860, op = 157
>run_cfi_program: CFA_offset r29 at cfa+0xfffffffffffffff0
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c88861
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c88862, op = 147
>run_cfi_program: CFA_offset r19 at cfa+0xffffffffffffffe8
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c88863
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c88864, op = 148
>run_cfi_program: CFA_offset r20 at cfa+0xffffffffffffffe0
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c88865
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c88866, op = 149
>run_cfi_program: CFA_offset r21 at cfa+0xffffffffffffffd0
>run_cfi_program: before reading op: ip = 0xa172c60 , addr = 0x4c88867
>run_cfi_program: after reading op: ip = 0xa172c60, addr = 0x4c88868, op = 0
>put_rs_cache: unmasking signals/interrupts and releasing lock
- cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0xfffdd5bc56f0) + 16
> 0xfffdd5bc2ea8 [30] = 0xfffdd5bc4698
- [30]: where[]=CFAREL ; 0xfffdd5bc56f8 == (new-cfa=0xfffdd5bc5700) + -8, cfa_type=3
<< ip . operator& () = loc[30] = 0xfffdd5bc56f8
[step] input: child-cfa=0xfffdd5bc5700 , ip=0xffff96d37b4c , ip.is_signal_frame=0
Obviously, the root cause is that partial states (before x29 gest updated and after x29 gets restored) are not handled correctly when parsing FDE data.
But I haven't tested the same C/C++ code on x86-64, so I'm not sure why a same program works very well and never crashes on x86-64.
Sorry my last comment is not so correct - although libunwind can't get a correct SP
(CFA) before or after FP (x29)
has a callee's value, it doesn't affect IP
values and further CFA
in most cases.
Today I noticed the FDE code of new
looks a bit strange: it makes cfa @ sp := (c->cfa=0xfffde41c5a00) + 32
instead of cfa @ regnum := reg.val=29 , regnum=29 , ret=0 , (cfa=0xfffdd5bc56f0) + 16
.
So, I dumped new
code and FDE:
>run_cfi_program: run cfi: start-ip = 0x0 , end-ip = 0xffffffffffffffff ,
start-addr = 0xffff91708669 , end-addr = 0xffff9170866c , align = [4, -8]
>run_cfi_program: CFA_def_cfa r31+0x0
>run_cfi_program: run cfi: start-ip = 0xffff91617b30 , end-ip = 0xffff91617b4b ,
start-addr = 0xffff9170a131 , end-addr = 0xffff9170a148 , align = [4, -8]
>run_cfi_program: CFA_advance_loc to 0xffff91617b34
>run_cfi_program: CFA_def_cfa_offset 0x20
>run_cfi_program: CFA_offset r29 at cfa+0xffffffffffffffe0
>run_cfi_program: CFA_offset r30 at cfa+0xffffffffffffffe8
>run_cfi_program: CFA_advance_loc to 0xffff91617b40
>run_cfi_program: CFA_offset r19 at cfa+0xfffffffffffffff0
>run_cfi_program: CFA_advance_loc to 0xffff91617b58
That means, the code body of new
has FP := SP := PreviousSP - 0x20
.
I think in such a case (new -> malloc -> SIGPERF
) libunwind may infer PreviousSP
from not SP
but FP
. But, the master code says:
As a special-case, if the stack-pointer is the CFA and the
stack-pointer wasn't saved, popping the CFA implicitly pops
the stack-pointer as well.
I wonder why and whether we may add a further more special case or not.
==========
Added:
This is the new
dumped directly from /usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.28
:
FYI: I made a test of recognizing the pattern, and crashes become fewer
- that a function
A
has SP-based CFA and calls another function B andB
is SIGPROF-ed. - When the pattern matches, I replace
SP+offset
withx29
-x29 data offset
(aka. the value ofCFA_offset(x29)
) - The new code works "somehow well" on such a case - FP, LR and IP are always correct then
- only CFA (and stack frame size) may be wrong.
But this doesn't solve the case of SIGPROF-ing on IP = 0xa163250
(malloc
is returning)
- my program is made by clang-8 with
-fno-omit-frame-pointer
, and currently it looks like: - all functions are compiled using
str ** [sp, #**]!
and thenadd x29, sp, #**
, but notstp x29, x30, [sp, #**]!
- all
FDE
functions begins with something likeCFA_advance_loc(16)
, but no instr cares abouthow a function returns
- therefore, during returning
X29
may point to FP of a parent function and then cause libunwind skips the next frame- updated: I think we have to parse
access_mem(IP - 4/8/12)
to recognizeldp x29, x30, [sp, #**]
. Any other idea?
- updated: I think we have to parse
CFA
will be wrong, sofollowing functions which haveSP-based
CFA may cause crashes- updated: the above line is wrong - because of auto-correcting SP-based CFA into FP-based, it shouldn't crash.