Possibly incorrect lifting of ldrsw to VEX on arm64
vadimkotov opened this issue · 1 comments
Hi folks,
I think I've come across an inaccuracy when lifting the following piece of code from arm64 to VEX (unless I'm using it wrong):
00114d4c ldrsw x15, DAT_00114d70
The ldrsw
instruction is supposed to load a word (32-bit value) from the address and sign-extend it into x15
(see arm user guide).
However, this is what it is getting lifted to:
IRSB {
t0:Ity_I64 t1:Ity_I64
00 | ------ IMark(0x14d4c, 4, 0) ------
01 | t0 = LDle:I64(0x0000000000014d70)
02 | PUT(x15) = t0
NEXT: PUT(pc) = 0x0000000000014d50; Ijk_Boring
}
When executed by SimulationManager
, a 64-bit value gets loaded to x15
as opposed to a 32-bit one.
The binary I'm reverse engineering right now uses this instruction to dynamically calculate the branch address (as an obfuscation technique) and so it breaks all analyses.
For comparison, here is Ghidra's pcode which gets it right:
$U5490:4 = LOAD ram(0x114d70:8)
x15 = INT_SEXT $U5490:4
Here's the Python code used to reproduce the VEX output:
import pyvex
import archinfo
import capstone
code = b'\x2f\x01\x00\x98'
addr = 0x14d4c
irsb = pyvex.lift(code, addr, archinfo.ArchAArch64())
irsb.pp()
md = capstone.Cs(capstone.CS_ARCH_ARM64, capstone.CS_MODE_ARM)
for (address, size, mnemonic, op_str) in md.disasm_lite(code, addr):
print("0x%x:\t%s\t%s" %(address, mnemonic, op_str))
Cheers,
Vadim
PS. I'm not sure if I'm at liberty to share the full binary, but aside from that I'll be happy to provide any additional information on the matter.
Hey! Sorry it took so long to get back to you. I've fixed libvex to lift the instruction correctly:
[-] In [3]: code = b'\x2f\x01\x00\x98'
[-] In [4]: addr = 0x14d4c
[-] In [5]: irsb = pyvex.lift(code, addr, archinfo.ArchAArch64())
[+] In [6]: irsb.pp()
IRSB {
t0:Ity_I64 t1:Ity_I32 t2:Ity_I64
00 | ------ IMark(0x14d4c, 4, 0) ------
01 | t1 = LDle:I32(0x0000000000014d70)
02 | t0 = 32Sto64(t1)
03 | PUT(x15) = t0
NEXT: PUT(pc) = 0x0000000000014d50; Ijk_Boring
}