Pyvex on Thumb assembly code
Closed this issue · 10 comments
Hi, I am confused why pyvex would translate the assembly code to the following IR.
The assembly code is below:
LOAD:FE84B32E MOVW R8, #0x81F8
LOAD:FE84B332 ADD R3, SP, #0x28+var_1C
LOAD:FE84B334 MOVT.W R8, #0xFC4B
LOAD:FE84B338 ADD R2, SP, #0x28+var_24
LOAD:FE84B33A MOV R0, R8
The translated IR is below:
00 | ------ IMark(0xfe84b32e, 4, 1) ------
01 | t0 = GET:I32(itstate)
02 | t1 = Shr32(t0,0x08)
03 | t41 = And32(t0,0x000000f0)
04 | t40 = Xor32(t41,0x000000e0)
05 | t42 = GET:I32(cc_op)
06 | t39 = Or32(t42,t40)
07 | t43 = GET:I32(cc_dep1)
08 | t44 = GET:I32(cc_dep2)
09 | t45 = GET:I32(cc_ndep)
10 | t46 = armg_calculate_condition(t39,t43,t44,t45):Ity_I32
11 | t48 = CmpNE32(t41,0x00000000)
12 | t47 = ITE(t48,t46,0x00000001)
13 | t53 = GET:I32(r8)
14 | t54 = CmpNE32(t47,0x00000000)
15 | t52 = ITE(t54,0x000081f8,t53)
16 | ------ IMark(0xfe84b332, 2, 1) ------
17 | t7 = Shr32(t1,0x08)
18 | t57 = And32(t1,0x000000f0)
19 | t56 = Xor32(t57,0x000000e0)
20 | t55 = Or32(t42,t56)
21 | t62 = armg_calculate_condition(t55,t43,t44,t45):Ity_I32
22 | t64 = CmpNE32(t57,0x00000000)
23 | t63 = ITE(t64,t62,0x00000001)
24 | t69 = GET:I32(r3)
25 | t71 = GET:I32(sp)
26 | t70 = Add32(t71,0x0000000c)
27 | t72 = CmpNE32(t63,0x00000000)
28 | t68 = ITE(t72,t70,t69)
29 | PUT(r3) = t68
30 | ------ IMark(0xfe84b334, 4, 1) ------
31 | t13 = Shr32(t7,0x08)
32 | t75 = And32(t7,0x000000f0)
33 | t74 = Xor32(t75,0x000000e0)
34 | t73 = Or32(t42,t74)
35 | t80 = armg_calculate_condition(t73,t43,t44,t45):Ity_I32
36 | t82 = CmpNE32(t75,0x00000000)
37 | t81 = ITE(t82,t80,0x00000001)
38 | t87 = And32(t52,0x0000ffff)
39 | t86 = Or32(t87,0xfc4b0000)
40 | t91 = CmpNE32(t81,0x00000000)
41 | t89 = ITE(t91,t86,t52)
42 | PUT(r8) = t89
43 | ------ IMark(0xfe84b338, 2, 1) ------
44 | t20 = Shr32(t13,0x08)
45 | t94 = And32(t13,0x000000f0)
46 | t93 = Xor32(t94,0x000000e0)
47 | t92 = Or32(t42,t93)
48 | t99 = armg_calculate_condition(t92,t43,t44,t45):Ity_I32
49 | t101 = CmpNE32(t94,0x00000000)
50 | t100 = ITE(t101,t99,0x00000001)
51 | t106 = GET:I32(r2)
52 | t107 = Add32(t71,0x00000004)
53 | t109 = CmpNE32(t100,0x00000000)
54 | t105 = ITE(t109,t107,t106)
55 | PUT(r2) = t105
56 | ------ IMark(0xfe84b33a, 2, 1) ------
57 | t26 = Shr32(t20,0x08)
58 | t112 = And32(t20,0x000000f0)
59 | t111 = Xor32(t112,0x000000e0)
60 | t110 = Or32(t42,t111)
61 | t117 = armg_calculate_condition(t110,t43,t44,t45):Ity_I32
62 | t119 = CmpNE32(t112,0x00000000)
63 | t118 = ITE(t119,t117,0x00000001)
64 | t124 = GET:I32(r0)
65 | t125 = CmpNE32(t118,0x00000000)
66 | t123 = ITE(t125,t89,t124)
67 | PUT(r0) = t123
68 | PUT(pc) = 0xfe84b33d
I am confused on these points.
- Why the vex would calculate the condition for every instruction?
- I may not understand the
ITE
well. For example, for instruction0xfe84b33a
, why vex would useITE
to conduct themov
operation.(line 64-67). I don't think this instruction would be executed conditionally. Many Thanks
the answer to both of your questions is that this is how vex encodes whether or not the instructions should be skipped because they are part of an IT block. If you lift instruction as part of an entire project (more accurately, by passing a whole page of data and an offset instead of a bytestring) vex will be able to look backwards, determine that there is not an IT instruction there, and remove the calculation.
more accurately, by passing a whole page of data and an offset instead of a bytestring
In this way, pyvex may need to construct the whole control flow graph. Right? I noticed that with the function pyvex.lift
, pyvex would stop lifting while it comes across a branch instruction. I need to feed the byte code of the next basic block to vex again. Thus, I don't think this strategy works
Is it possible to tell vex that this is not a part of an IT block? Are there any explicit option in the API? Many Thanks.
- libvex never does any control flow recovery. god forbid. 😰
- instead, it will scan backwards four instructions and check if any of them are ITs. This works because IT blocks may never contain jumps. This of course only works if the instructions being passed are actually in-context, instead of isolated in a bytestring.
- You could theoretically do this... the problem is that libvex is designed to always do this check, since it is designed to do exactly one thing which is make valgrind work, and we added code to make the whole analysis automatically return "there could be an IT" depending on that flag. You would want to change the bool to an int, and then make a third value trigger automatically returning "there could not be an IT" and then add a parameter to the python interface that controls it.
- I don't have the time to do this myself though - but vex is pretty fun to hack with, you should give it a shot. We would gladly accept that PR :)
Thanks for your explanation. I am confused on the following part.
instead, it will scan backwards four instructions and check if any of them are ITs. This works because IT blocks may never contain jumps. This of course only works if the instructions being passed are actually in-context, instead of isolated in a bytestring.
Scan backwards four instructions should be the right behavior. However, let's go back to the example I shown. I refer to the instruction at 0xFE84B33A
. Non of the previous four instructions is IT
instruction (And I feed it to vex). However, vex would still to the IT check. Do you think this is a normal behavior?
This of course only works if the instructions being passed are actually in-context, instead of isolated in a bytestring.
I am sorry that I do not understand the meaning of actually in-context
. I just think the bytecode I feed to vex is enough for vex to know that at least instruction 0xFE84B33A
is not in IT block.
You could theoretically do this... the problem is that libvex is designed to always do this check, since it is designed to do exactly one thing which is make valgrind work, and we added code to make the whole analysis automatically return "there could be an IT" depending on that flag. You would want to change the bool to an int, and then make a third value trigger automatically returning "there could not be an IT" and then add a parameter to the python interface that controls it.
Thank you so much for the suggestions. I would like to try to add the parameter. Would send the PR if I can make it. Thanks
You're right - it can be optimized as-is. Right now the check is if (allow_optimizing) optimize()
but it could be if (allow_optimizing || num_instructions_in_block > 4) optimize()
. That would be another cool thing for you to add!
Okay, I will do it asap. I tried to develop it with angr-dev. However, pyvex still has the other bugs that cannot be compiled successfully. Please refer to angr/angr-dev#87
Hi @rhelmot . Sorry to bother you again. I am just doubt whether the flag you added can really control the analysis.
I set the allow_arch_optimizations = False
. I guess this is the direct method to disable the vex to check the existing of IT basic block. However, the output is the same. It seems that might be some other options controlling this.... I am still digging the code.
allow_arch_optimizations = False will do the opposite of what you want. Doing the back-scanning is an optimization, and that will disallow it.
This issue has been marked as stale
because it has no recent activity. Please comment or add the pinned
tag to prevent this issue from being closed.
This issue has been closed due to inactivity.