angr/pyvex

Error lifting ARM mov Opcode that is being mistaken for a ccall

Closed this issue · 6 comments

I've isolated an error when lifting ARM opcodes as demonstrated when executing the following:
irsb = pyvex.lift(b'\x49\xf6\x50\x79\xc0\xf2\x0e\x09',0x36eb0,archinfo.ArchARMEL())

According to IDA this is a 'mov instruction: MOV R9, #0xE9F50

However, pyvex tries to invoke cccall from angr.engines.vex because it thinks that this is a conditional call, which it is not. This generates an error because I only have pyvex installed, but it appears that in the code base of pyvex, it still tries to access an angr module.

So hear are my questions:

  1. Why is pyvex mistaking this move for a conditional call?
  2. Why does the standalone pyvex package intalled from pip have dependencies to angr?

Here's the culprit:

from angr.engines.vex import ccall

We should probably fix this... There is also a test depeding on angr, which we really shouldn't be doing here.

There are a lot of things going wrong here.

  1. The bytecode you posted is thumb code. You're disassembling it as ARM code. It is kind of obscure how you need to present thumb code to vex in order to get it processed correctly, but here's how you do it:

pyvex.lift(bytecode, address+1, arch, bytes_offset=1)

With this, we encode the thumb-ness of the code as the least significant bit of the instruction pointer, same as BX calls. Additionally, you need to specify bytes_offset to inform pyvex that the address you've named is in fact 1 byte into the bytestring.

  1. ccalls are not conditional calls, they are clean calls. When vex needs to a model an expression too complicated to express in IR, it will generate a call to a helper function. These can be clean (no side effects) or dirty (can mutate state freely). They are mostly used to deal with the status register flags on ARM and x86.

  2. It is indeed sad that we depend on angr for part of the ccall handling. However, this support is only used in our extensions to the libvex lifter. It shouldn't be touched if you're lifting kosher code that can run in the linux userland. I agree this is still a bug though, and I will assign this to the person who ought to fix it.

@rhelmot Thanks for the detailed explanation that includes how to present thumb code to vex

All of these things are accurate -- except this isn't necessarily a bug, but more of a UX thing. We very deliberately made that coupling choice, and there's no sane way to make it go away without brutally refactoring angr and pyvex's handling of ccalls. (we should do this, but not right now) Currently, all CCalls are maintained statically in angr's ccall.py, and this nasty hacktastic dumpster fire of a feature allows for user-provided CCalls for lifters, while avoiding even worse coupling problems. (You can have my "BF" CCalls in angr proper if you want... No? Thought so :) )

In @kye4u2 's original example, the code was lifted with the wrong thumbness -- in standard ARM VEX, this causes a decode error, and the lifting is passed onto our lifter extension, which handles it, correctly, as a super messed-up variant of LDM only present in system code (with the modeswitch flag set). This triggers the lifting of a CCall (the flag calculation CCall for all instructions) which triggers this issue.

tl;dr: Lift it as thumb and the problem goes completely away.
But the fix I'm going to do here is to wall off the Gymrat extensions when angr is not installed. The result will be the same, just with a chastising exception message instead of that confusing one, and causing an Ijk_NoDecode.

That said, if anyone has ideas on how we should finally clean up CCalls once and for all without going insane, let's discuss

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

This issue has been closed due to inactivity.