angr/pyvex

Limit of the number of instructions that pyvex.lift can lift each time

luo8979061 opened this issue · 6 comments

Question

Is there a limit to how many assembly instructions pyvex can translate? There seems to be a maximum of 100. An error occurred when I executed the following code:

irsb = pyvex.lift(opcodes, source_addr, archinfo.ArchAMD64(),opt_level=0)

The following result is obtained:
IRSB <0x16f bytes, 99 ins., <Arch AMD64 (LE)>> at 0x7fffece0f9ee
But actually the opcodes are more than 0x16f bytes, the number of assembly instructions to translate should be more than 99

ltfish commented

The limit is 100, and I believe this limit is coming from libVEX, not PyVEX.

Is there any way to change the limit to 200

200 will... probably work? the main reason there are hard limits is that libvex doesn't use a dynamic memory allocator, so sizes have to be kept pretty strongly in check. However, we've found that there isn't really a good point in having a larger limit - the nature of analysis with vex is that you have to deal with the fact that basic blocks may start at points other than control flow junctions, since libvex does fully dynamic lifting, without control flow recovery.

I have changed the limit to 200 on the code corresponding to the following two links:
https://github.com/angr/vex/blob/939f423dbb6282cf14bc5d90ff8b37c2c5992e65/priv/guest_generic_bb_to_IR.c#L228
https://github.com/angr/vex/blob/939f423dbb6282cf14bc5d90ff8b37c2c5992e65/priv/main_main.c#L310
But the pyvex.lift function shows that only 105 assembly instructions have been converted:
IRSB <0x182 bytes, 105 ins., <Arch AMD64 (LE)>> at 0x7fffece0f9ee

I actually wanted to do data flow analysis between basic blocks, so I ignored the jump instruction between basic blocks, so it was equivalent to combining multiple basic blocks into a large basic block, and then using pyvex.lift function to convert the large basic block into vex IR, and then do data flow analysis. But currently pyvex.lift itself has a quantity limit. Do you have any good suggestions?

If you want to do any serious static analysis with pyvex you want to be using angr. That's as good of a suggestion as I can provide.