angr/pyvex

Pyvex on Thumb assembly code

Closed this issue · 10 comments

Hi, I am confused why pyvex would translate the assembly code to the following IR.
The assembly code is below:

LOAD:FE84B32E                 MOVW            R8, #0x81F8
LOAD:FE84B332                 ADD             R3, SP, #0x28+var_1C
LOAD:FE84B334                 MOVT.W          R8, #0xFC4B
LOAD:FE84B338                 ADD             R2, SP, #0x28+var_24
LOAD:FE84B33A                 MOV             R0, R8

The translated IR is below:

   00 | ------ IMark(0xfe84b32e, 4, 1) ------
   01 | t0 = GET:I32(itstate)
   02 | t1 = Shr32(t0,0x08)
   03 | t41 = And32(t0,0x000000f0)
   04 | t40 = Xor32(t41,0x000000e0)
   05 | t42 = GET:I32(cc_op)
   06 | t39 = Or32(t42,t40)
   07 | t43 = GET:I32(cc_dep1)
   08 | t44 = GET:I32(cc_dep2)
   09 | t45 = GET:I32(cc_ndep)
   10 | t46 = armg_calculate_condition(t39,t43,t44,t45):Ity_I32
   11 | t48 = CmpNE32(t41,0x00000000)
   12 | t47 = ITE(t48,t46,0x00000001)
   13 | t53 = GET:I32(r8)
   14 | t54 = CmpNE32(t47,0x00000000)
   15 | t52 = ITE(t54,0x000081f8,t53)
   16 | ------ IMark(0xfe84b332, 2, 1) ------
   17 | t7 = Shr32(t1,0x08)
   18 | t57 = And32(t1,0x000000f0)
   19 | t56 = Xor32(t57,0x000000e0)
   20 | t55 = Or32(t42,t56)
   21 | t62 = armg_calculate_condition(t55,t43,t44,t45):Ity_I32
   22 | t64 = CmpNE32(t57,0x00000000)
   23 | t63 = ITE(t64,t62,0x00000001)
   24 | t69 = GET:I32(r3)
   25 | t71 = GET:I32(sp)
   26 | t70 = Add32(t71,0x0000000c)
   27 | t72 = CmpNE32(t63,0x00000000)
   28 | t68 = ITE(t72,t70,t69)
   29 | PUT(r3) = t68
   30 | ------ IMark(0xfe84b334, 4, 1) ------
   31 | t13 = Shr32(t7,0x08)
   32 | t75 = And32(t7,0x000000f0)
   33 | t74 = Xor32(t75,0x000000e0)
   34 | t73 = Or32(t42,t74)
   35 | t80 = armg_calculate_condition(t73,t43,t44,t45):Ity_I32
   36 | t82 = CmpNE32(t75,0x00000000)
   37 | t81 = ITE(t82,t80,0x00000001)
   38 | t87 = And32(t52,0x0000ffff)
   39 | t86 = Or32(t87,0xfc4b0000)
   40 | t91 = CmpNE32(t81,0x00000000)
   41 | t89 = ITE(t91,t86,t52)
   42 | PUT(r8) = t89
   43 | ------ IMark(0xfe84b338, 2, 1) ------
   44 | t20 = Shr32(t13,0x08)
   45 | t94 = And32(t13,0x000000f0)
   46 | t93 = Xor32(t94,0x000000e0)
   47 | t92 = Or32(t42,t93)
   48 | t99 = armg_calculate_condition(t92,t43,t44,t45):Ity_I32
   49 | t101 = CmpNE32(t94,0x00000000)
   50 | t100 = ITE(t101,t99,0x00000001)
   51 | t106 = GET:I32(r2)
   52 | t107 = Add32(t71,0x00000004)
   53 | t109 = CmpNE32(t100,0x00000000)
   54 | t105 = ITE(t109,t107,t106)
   55 | PUT(r2) = t105
   56 | ------ IMark(0xfe84b33a, 2, 1) ------
   57 | t26 = Shr32(t20,0x08)
   58 | t112 = And32(t20,0x000000f0)
   59 | t111 = Xor32(t112,0x000000e0)
   60 | t110 = Or32(t42,t111)
   61 | t117 = armg_calculate_condition(t110,t43,t44,t45):Ity_I32
   62 | t119 = CmpNE32(t112,0x00000000)
   63 | t118 = ITE(t119,t117,0x00000001)
   64 | t124 = GET:I32(r0)
   65 | t125 = CmpNE32(t118,0x00000000)
   66 | t123 = ITE(t125,t89,t124)
   67 | PUT(r0) = t123
   68 | PUT(pc) = 0xfe84b33d

I am confused on these points.

  1. Why the vex would calculate the condition for every instruction?
  2. I may not understand the ITE well. For example, for instruction 0xfe84b33a, why vex would use ITE to conduct the mov operation.(line 64-67). I don't think this instruction would be executed conditionally. Many Thanks

the answer to both of your questions is that this is how vex encodes whether or not the instructions should be skipped because they are part of an IT block. If you lift instruction as part of an entire project (more accurately, by passing a whole page of data and an offset instead of a bytestring) vex will be able to look backwards, determine that there is not an IT instruction there, and remove the calculation.

more accurately, by passing a whole page of data and an offset instead of a bytestring

In this way, pyvex may need to construct the whole control flow graph. Right? I noticed that with the function pyvex.lift , pyvex would stop lifting while it comes across a branch instruction. I need to feed the byte code of the next basic block to vex again. Thus, I don't think this strategy works

Is it possible to tell vex that this is not a part of an IT block? Are there any explicit option in the API? Many Thanks.

  • libvex never does any control flow recovery. god forbid. 😰
  • instead, it will scan backwards four instructions and check if any of them are ITs. This works because IT blocks may never contain jumps. This of course only works if the instructions being passed are actually in-context, instead of isolated in a bytestring.
  • You could theoretically do this... the problem is that libvex is designed to always do this check, since it is designed to do exactly one thing which is make valgrind work, and we added code to make the whole analysis automatically return "there could be an IT" depending on that flag. You would want to change the bool to an int, and then make a third value trigger automatically returning "there could not be an IT" and then add a parameter to the python interface that controls it.
  • I don't have the time to do this myself though - but vex is pretty fun to hack with, you should give it a shot. We would gladly accept that PR :)

Thanks for your explanation. I am confused on the following part.

instead, it will scan backwards four instructions and check if any of them are ITs. This works because IT blocks may never contain jumps. This of course only works if the instructions being passed are actually in-context, instead of isolated in a bytestring.

Scan backwards four instructions should be the right behavior. However, let's go back to the example I shown. I refer to the instruction at 0xFE84B33A. Non of the previous four instructions is IT instruction (And I feed it to vex). However, vex would still to the IT check. Do you think this is a normal behavior?

This of course only works if the instructions being passed are actually in-context, instead of isolated in a bytestring.

I am sorry that I do not understand the meaning of actually in-context. I just think the bytecode I feed to vex is enough for vex to know that at least instruction 0xFE84B33A is not in IT block.

You could theoretically do this... the problem is that libvex is designed to always do this check, since it is designed to do exactly one thing which is make valgrind work, and we added code to make the whole analysis automatically return "there could be an IT" depending on that flag. You would want to change the bool to an int, and then make a third value trigger automatically returning "there could not be an IT" and then add a parameter to the python interface that controls it.

Thank you so much for the suggestions. I would like to try to add the parameter. Would send the PR if I can make it. Thanks

You're right - it can be optimized as-is. Right now the check is if (allow_optimizing) optimize() but it could be if (allow_optimizing || num_instructions_in_block > 4) optimize(). That would be another cool thing for you to add!

Okay, I will do it asap. I tried to develop it with angr-dev. However, pyvex still has the other bugs that cannot be compiled successfully. Please refer to angr/angr-dev#87

Hi @rhelmot . Sorry to bother you again. I am just doubt whether the flag you added can really control the analysis.

I set the allow_arch_optimizations = False . I guess this is the direct method to disable the vex to check the existing of IT basic block. However, the output is the same. It seems that might be some other options controlling this.... I am still digging the code.

allow_arch_optimizations = False will do the opposite of what you want. Doing the back-scanning is an optimization, and that will disallow it.

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

This issue has been closed due to inactivity.