angr/pyvex

Confused ITstate remove for signal Vex Translate in Thumb mode

xrivendell7 opened this issue · 6 comments

I want to translate a signal ARM Thumb instruction to Vex IR, like Mov r0, r7 .
Howerver the semantic is really rich rather than a singl MOV insturction like below:

>>> import pyvex
>>> import archinfo
>>> import codecs
>>> a = codecs.decode("3846",'hex')
>>> i = pyvex.lift(a,addr+1,arch,bytes_offset=1)
>>> i.pp()
IRSB {
   t0:Ity_I32 t1:Ity_I32 t2:Ity_I32 t3:Ity_I32 t4:Ity_I32 t5:Ity_I32 t6:Ity_I32 t7:Ity_I32 t8:Ity_I32 t9:Ity_I32 t10:Ity_I32 t11:Ity_I32 t12:Ity_I32 t13:Ity_I32 t14:Ity_I32 t15:Ity_I32
 t16:Ity_I1 t17:Ity_I32 t18:Ity_I32 t19:Ity_I32 t20:Ity_I32 t21:Ity_I32 t22:Ity_I1 t23:Ity_I32

   00 | ------ IMark(0x8048000, 2, 1) ------
   01 | t0 = GET:I32(itstate)
   02 | t1 = Shr32(t0,0x08)
   03 | PUT(itstate) = t1
   04 | t9 = And32(t0,0x000000f0)
   05 | t8 = Xor32(t9,0x000000e0)
   06 | t10 = GET:I32(cc_op)
   07 | t7 = Or32(t10,t8)
   08 | t11 = GET:I32(cc_dep1)
   09 | t12 = GET:I32(cc_dep2)
   10 | t13 = GET:I32(cc_ndep)
   11 | t14 = armg_calculate_condition(t7,t11,t12,t13):Ity_I32
   12 | t16 = CmpNE32(t9,0x00000000)
   13 | t15 = ITE(t16,t14,0x00000001)
   14 | t6 = GET:I32(r7)
   15 | t21 = GET:I32(r0)
   16 | t22 = CmpNE32(t15,0x00000000)
   17 | t20 = ITE(t22,t6,t21)
   18 | PUT(r0) = t20
   NEXT: PUT(pc) = 0x08048003; Ijk_Boring
}

From issue 196, I know it's caused by the check ITstate from libvex and may use allow_arch_optimizations = True to avoid it.
RQ1 : How to set allow_arch_optimizations since i'm a little wired about the sourcecode... Should I pass non-bytes data to lifter? How should I use this option?

    if isinstance(data, (bytes, bytearray, memoryview)):
        py_data = data
        c_data = None
        allow_arch_optimizations = False
    else:
        if max_bytes is None:
            raise PyVEXError("Cannot lift block with ffi pointer and no size (max_bytes is None)")
        c_data = data
        py_data = None
        allow_arch_optimizations = True

RQ2: My ultimate goal is translating a single instruction like mov in Thumb to remove the ITE... armg_calculate_flag_v ... or something redundant (maybe) , finally just keep the import MOV operation semantic. Do you have any suggestions for that? like... Emmm.. i.e. Noping behind Mips instruction to translate Mips branch instruction for Vex, Is there something tips for that question?
Thankyou!

In order to safely remove the ITE checks, vex must be able to scan backwards in the binary to determine if there are any IT instructions. Because of this, you must pass a cffi pointer to the instruction data in the context of the rest of the binary in order for allow_arch_optimizations to be turned on.

I don't recommend this, but you can also try checking out #197 which lets you control the option explicitly. I haven't merged it because the correct way to manage the optimization is above, but it is available to you.

Sorry for bother you again, Howerver, force_optimize=True / allow_arch_optimizations = True seems no help for the problem...

>> a = codecs.decode("3846",'hex')
>>> i = pyvex.lift(a,addr+1,arch,force_optimize=True)
>>> i.pp()
IRSB {
   t0:Ity_I32 t1:Ity_I32 t2:Ity_I32 t3:Ity_I32 t4:Ity_I32 t5:Ity_I32 t6:Ity_I32 t7:Ity_I32 t8:Ity_I32 t9:Ity_I1 t10:Ity_I32 t11:Ity_I32 t12:Ity_I32 t13:Ity_I32 t14:Ity_I32 t15:
Ity_I32 t16:Ity_I32 t17:Ity_I32 t18:Ity_I32 t19:Ity_I1 t20:Ity_I32 t21:Ity_I32 t22:Ity_I32 t23:Ity_I32 t24:Ity_I32 t25:Ity_I32 t26:Ity_I1 t27:Ity_I32 t28:Ity_I32 t29:Ity_I32 t3
0:Ity_I32 t31:Ity_I32 t32:Ity_I32 t33:Ity_I32 t34:Ity_I32 t35:Ity_I32

   00 | ------ IMark(0x8048000, 2, 1) ------
   01 | t0 = GET:I32(itstate)
   02 | t1 = Shr32(t0,0x08)
   03 | PUT(itstate) = t1
   04 | t12 = And32(t0,0x000000f0)
   05 | t11 = Xor32(t12,0x000000e0)
   06 | t13 = GET:I32(cc_op)
   07 | t10 = Or32(t13,t11)
   08 | t14 = GET:I32(cc_dep1)
   09 | t15 = GET:I32(cc_dep2)
   10 | t16 = GET:I32(cc_ndep)
   11 | t17 = armg_calculate_condition(t10,t14,t15,t16):Ity_I32
   12 | t19 = CmpNE32(t12,0x00000000)
   13 | t18 = ITE(t19,t17,0x00000001)
   14 | t22 = And32(t0,0x00000001)
   15 | t21 = Xor32(t22,0x00000001)
   16 | t5 = And32(t21,t18)
   17 | t6 = GET:I32(r0)
   18 | t25 = Sub32(t6,0x000000ff)
   19 | t26 = CmpNE32(t18,0x00000000)
   20 | t23 = ITE(t26,t25,t6)
   21 | PUT(r0) = t23
   22 | t9 = CmpNE32(t5,0x00000000)
   23 | t27 = ITE(t9,0x00000002,t13)
   24 | PUT(cc_op) = t27
   25 | t29 = ITE(t9,t6,t14)
   26 | PUT(cc_dep1) = t29
   27 | t31 = ITE(t9,0x000000ff,t15)
   28 | PUT(cc_dep2) = t31
   29 | t33 = ITE(t9,0x00000000,t16)
   30 | PUT(cc_ndep) = t33
   NEXT: PUT(pc) = 0x08048003; Ijk_Boring
}

Maybe it's my problem that I just wanna to use pyvex for independently convert asm to vex one by one for my reserach. Howerver ,you guys consider it just a part of Angr...

Maybe it's my problem that I just wanna to use pyvex for independently convert asm to vex one by one for my reserach. Howerver ,you guys consider it just a part of Angr...

My two cents: Doing many analyses safely (in this case, removing itstate) requires more than just analyzing a single instruction locally. Implementing such analyses in angr is usually way easier than implementing them in PyVEX, since angr provides enough scaffolding for developing global analyses. Redoing all the work that angr covers in PyVEX is just inefficient for us.

In this specific case, an easy workaround is prepending your instruction with several nop instructions since as far as I remember, the VEX ARM lifter does have logic for removing unnecessary itstates, but it does not apply to the first three instructions of each block.

The logic is here (my interpretation was probably off though): https://github.com/angr/vex/blob/master/priv/guest_arm_toIR.c#L19180

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

This issue has been closed due to inactivity.