beehive-lab/mambo

Inline hash lookup routine should preserve CPSR

alirazeen opened this issue · 4 comments

There are some internal routines inserted by the compiler (GCC, in my case), that require the CPSR to be kept intact across function calls. Consider the following two routines:

0003f290 <__aeabi_cdcmpeq>:
   3f290:       e92d4001        push    {r0, lr}
   3f294:       ebffffd4        bl      3f1ec <__cmpdf2>
   3f298:       e3500000        cmp     r0, #0
   3f29c:       43700000        cmnmi   r0, #0
   3f2a0:       e8bd8001        pop     {r0, pc}

0003f2a4 <__aeabi_dcmpeq>:
   3f2a4:       e52de008        str     lr, [sp, #-8]!
   3f2a8:       ebfffff8        bl      3f290 <__aeabi_cdcmpeq>
   3f2ac:       03a00001        moveq   r0, #1
   3f2b0:       13a00000        movne   r0, #0
   3f2b4:       e49df008        ldr     pc, [sp], #8

In this example, __aeabi_dcmpeq calls __aeabi_cdcmpeq. After __aeabi_cdcmpeq returns, __aeabi_dcmpeq executes a couple of conditional mov statements. These conditional statements are based on the CPSR flags as updated by the cmp and cmn statements in __aeabi_cdcmpeq.

If MAMBO is compiled with the DBM_D_INLINE_HASH flag, it will insert a hash lookup routine when it counters the pop {r0, pc} statement in 3f2a0. Since the hash lookup implementation uses cmp, too, the CPSR flags are clobbered and the rest of the execution becomes faulty when __aeabi_cdcmpeq returns.

I am using an older version of MAMBO (before commit 9b09670). But I suspect this will be an issue in the current version, too.

lgeek commented

Thanks for the report. The behavior is indeed unchanged in the current HEAD. This is a pretty big problem because writing to the CPSR is very slow (it flushes the pipeline) and CB(N)Z (the only conditional branch which doesn't use the CPSR) is limited to forward jumps in Thumb (T32) and not available at all in ARM (A32).

Fixing this might take a while, because I'll have to write and then benchmark the possible implementations across a few microarchitectures to see if there's any way not to degrade performance too much. If you just need a working solution soon, let me know and I'll push one in a separate branch.

Thanks! Yes, the MRS/MSR operations are indeed slow.

You can take your time to fix this bug. I have my own fork of MAMBO (which is hacked up in a lot of ways), and I have a local fix which works. I need it because I have encountered an Android app that triggers this bug. However, I think it should be fixed in the best way possible.

I wonder if there is a way to preserve CPSR selectively. The procedure call standard does say:

The N, Z, C, V and Q bits (bits 27-31) and the GE[3:0] bits (bits 16-19) are undefined on entry to or return from a public interface.

In theory, this is only a problem for internal compiler routines, and hand-written assembly. Perhaps a good signal would be "is there an unconditional branch immediately after an instruction that updates the CPSR?" If so, the CPSR should be preserved.

Edit: I guess this is not enough to cover all cases. Technically, we have to check if the next basic block uses the CPSR.

Closing due to inactivity. Please re-open if required.