Inline hash lookup routine should preserve CPSR
alirazeen opened this issue · 4 comments
There are some internal routines inserted by the compiler (GCC, in my case), that require the CPSR to be kept intact across function calls. Consider the following two routines:
0003f290 <__aeabi_cdcmpeq>:
3f290: e92d4001 push {r0, lr}
3f294: ebffffd4 bl 3f1ec <__cmpdf2>
3f298: e3500000 cmp r0, #0
3f29c: 43700000 cmnmi r0, #0
3f2a0: e8bd8001 pop {r0, pc}
0003f2a4 <__aeabi_dcmpeq>:
3f2a4: e52de008 str lr, [sp, #-8]!
3f2a8: ebfffff8 bl 3f290 <__aeabi_cdcmpeq>
3f2ac: 03a00001 moveq r0, #1
3f2b0: 13a00000 movne r0, #0
3f2b4: e49df008 ldr pc, [sp], #8
In this example, __aeabi_dcmpeq
calls __aeabi_cdcmpeq
. After __aeabi_cdcmpeq
returns, __aeabi_dcmpeq
executes a couple of conditional mov
statements. These conditional statements are based on the CPSR flags as updated by the cmp
and cmn
statements in __aeabi_cdcmpeq
.
If MAMBO is compiled with the DBM_D_INLINE_HASH
flag, it will insert a hash lookup routine when it counters the pop {r0, pc}
statement in 3f2a0
. Since the hash lookup implementation uses cmp
, too, the CPSR flags are clobbered and the rest of the execution becomes faulty when __aeabi_cdcmpeq
returns.
I am using an older version of MAMBO (before commit 9b09670). But I suspect this will be an issue in the current version, too.
Thanks for the report. The behavior is indeed unchanged in the current HEAD. This is a pretty big problem because writing to the CPSR is very slow (it flushes the pipeline) and CB(N)Z (the only conditional branch which doesn't use the CPSR) is limited to forward jumps in Thumb (T32) and not available at all in ARM (A32).
Fixing this might take a while, because I'll have to write and then benchmark the possible implementations across a few microarchitectures to see if there's any way not to degrade performance too much. If you just need a working solution soon, let me know and I'll push one in a separate branch.
Thanks! Yes, the MRS
/MSR
operations are indeed slow.
You can take your time to fix this bug. I have my own fork of MAMBO (which is hacked up in a lot of ways), and I have a local fix which works. I need it because I have encountered an Android app that triggers this bug. However, I think it should be fixed in the best way possible.
I wonder if there is a way to preserve CPSR selectively. The procedure call standard does say:
The N, Z, C, V and Q bits (bits 27-31) and the GE[3:0] bits (bits 16-19) are undefined on entry to or return from a public interface.
In theory, this is only a problem for internal compiler routines, and hand-written assembly. Perhaps a good signal would be "is there an unconditional branch immediately after an instruction that updates the CPSR?" If so, the CPSR should be preserved.
Edit: I guess this is not enough to cover all cases. Technically, we have to check if the next basic block uses the CPSR.
Closing due to inactivity. Please re-open if required.