angr/pyvex

PyVex fails decoding on Thumb Instructions

Closed this issue · 10 comments

Hi all,

I have a simple thumb basic block like this

.text:00001DCA 47 F8 AC 0C     STR.W           R0, [R7,#-0xAC]
.text:00001DCE 68 46           MOV             R0, SP
.text:00001DD0 A0 F1 08 0D     SUB.W           SP, R0, #8
.text:00001DD4 69 46           MOV             R1, SP

I tried to create a Block and dump IR stmt in this way project.factory.block(0x401DCB).vex.pp(), the result is

  00 | ------ IMark(0x401dca, 4, 1) ------
   01 | t2 = GET:I32(r7)
   02 | t3 = Sub32(t2,0x000000ac)
   03 | t4 = GET:I32(r0)
   04 | STle(t3) = t4
   05 | ------ IMark(0x401dce, 2, 1) ------
   06 | t7 = GET:I32(sp)
   07 | PUT(r0) = t7
   08 | ------ IMark(0x401dd0, 0, 1) ------
   09 | PUT(itstate) = 0x00000000
   NEXT: PUT(pc) = 0x00401dd1; Ijk_NoDecode

I want to know why PyVex fails decoding instructions at 0x401DD1?

Thank you!

The instruction format seems to be unsupported by libvex, which is concerning. I'll take a look.

From what I can tell from reading the arm instruction reference, that instruction is actually invalid. At least according to the spec, having that format of sub instruction write to the stack pointer is UNPREDICTABLE.

Do you have a demonstration of ARM hardware running that instruction? If not, I would assume it is a genuinely bad instruction and the disassembly is at fault.

My interpretation of the (terrible, horrible, no-good, very-bad) ARM encoding documentation is that this was trying to be encoding T3 (page 4-365) of here http://class.ece.iastate.edu/cpre288/resources/docs/Thumb-2SupplementReferenceManual.pdf
...and if the compiler author had read the thing, they would have used T2 on 4-369 instead, per the note in the text. This doesn't call it out as specifically UNPREDICTABLE (d is not BadReg or PC), but I'd say given how this is worded we could treat it as such.
Aw snap! An ambiguity in an ARM spec? Inconceivable! :P

My advice: Sounds like a job for... Spotters!

d is badreg though, badreg is sp or pc

Wait... you're right! SP is d, not R0
Yeah, then this is just unpredictable, that's that.

You could still spot it, but I'm a bit worried as to why such an instruction would be flying around in the first place

What do an ARM processor and Qemu day?

From what I can tell from reading the arm instruction reference, that instruction is actually invalid. At least according to the spec, having that format of sub instruction write to the stack pointer is UNPREDICTABLE.

Do you have a demonstration of ARM hardware running that instruction? If not, I would assume it is a genuinely bad instruction and the disassembly is at fault.

Hi rhelmot ,

Thank you for your reply.

These instructions are part of a library compiled with a ollvm-like compiler. And these instructions run well on my Pixel.

.text:00001DCA 47 F8 AC 0C     STR.W           R0, [R7,#-0xAC]
.text:00001DCE 68 46           MOV             R0, SP
.text:00001DD0 A0 F1 08 0D     SUB.W           SP, R0, #8
.text:00001DD4 69 46           MOV             R1, SP

Hello all,

Thank you for all your help for my issue. I've been trying to solve some real-world problems uinsg angr, I think it can do much more than solving toy-like CTF crackmes.

Insturctions like SUB.W SP, R0, #8 are very common in binary generated by custom ollvm compilers, so I hope PyVEX can decode well. Attachment is a simple program calculating md5, it contains instructions that can reproduce the issue.

./hello "hello world"

.text:000036CC loc_36CC                                ; DATA XREF: .text:0000357C↑o
.text:000036CC                 MOV             R0, SP
.text:000036CE                 SUB.W           SP, R0, #8
.text:000036D2                 MOV             R1, SP
.text:000036D4                 SUB.W           SP, R1, #8

hello.zip

Just pushed the fix. Your code should lift correctly now.

I think it can do much more than solving toy-like CTF crackmes.

so do we!

Hi rhelmot,

Thank you for your work! This is a also a good chance for me to learn pyvex's internal from your fix, hope I can do it myself next time:)