angr/pyvex

[x86_64] [AVX2]: Unknown register with offset

SjRNMzU opened this issue · 9 comments

I am using the latest git version of Claripy on Linux Debian Sid with Python 3.8. I am getting an unknown register name or register size when using the x86_64 instruction vmovdqa with xmm. registers.

The following examples reproduce this error:

Example 1: Unknown register 240 and 272

data = b'\xc5\xfao\x06\xc5\xfaoI\xf0\xc5\xfa\x7f\x07\xc4\xc1z\x7fI\xf0\xc3'
vaddr = 4441805
irsb  = pyvex.IRSB(data, vaddr, archinfo.ArchAMD64(), opt_level=2)

irsb.pp()
IRSB {                                                                      
t0:Ity_V128 t1:Ity_I64 t2:Ity_V128 t3:Ity_I64 t4:Ity_V128 t5:Ity_I64 t6:Ity_V128 t7:Ity_I64 t8:Ity_I64 t9:Ity_I64 t10:Ity_I64 t11:Ity_I64 t12:Ity_I64 t13:Ity_I64 t14:Ity_I64 t15:Ity_I64 t16:Ity_I64
                                                                                
------ IMark(0x43c6cd, 4, 0) ------                                 
t1 = GET:I64(rsi)                                                   
t0 = LDle:V128(t1)                                                  
PUT(xmm0) = t0                                                      
PUT(240) = 0                                                        
PUT(rip) = 0x000000000043c6d1                                       
------ IMark(0x43c6d1, 5, 0) ------                                 
t12 = GET:I64(rcx)                                                  
t11 = Add64(t12,0xfffffffffffffff0)                                 
t2 = LDle:V128(t11)                                                 
PUT(xmm1) = t2                                                      
PUT(272) = 0                                                        
PUT(rip) = 0x000000000043c6d6         
...

##capstone disassembly
'vmovdqu xmm0, xmmword ptr [rsi]',   =                                      
'vmovdqu xmm1, xmmword ptr [rcx - 0x10]',                                  
'vmovdqu xmmword ptr [rdi], xmm0',                                         
'vmovdqu xmmword ptr [r9 - 0x10], xmm1',                                   
'ret'

Example 2: Fails to decode

data = b'b\xf1|H\x10a\xfeb\xf1|H\x10i\xffM\x89\xc8I\x83\xe1\x80M)\xc8L)\xc1L)\xc2M\x01\xc8'
vaddr = 0x000000000043cc10
irsb.pp()
IRSB {                                                                      
      NEXT: PUT(rip) = 0x000000000043cc1f; Ijk_NoDecode                        
}  

##capstone disassembly
'vmovups zmm4, zmmword ptr [rcx - 0x80]',                                  
'vmovups zmm5, zmmword ptr [rcx - 0x40]',                                  
'mov r8, r9',                                                              
'and r9, 0xffffffffffffff80',                                              
'sub r8, r9',                                                              
'sub rcx, r8',                                                             
'sub rdx, r8',                                                             
'add r8, r9'

Example 3: InstPut is to a register with an unknown offset (VEX info not shown, InstPut(272)).

data  = b'\xc5\xfao\x06\xc5\xfaoI\xf0\xc5\xfa\x7f\x07\xc4\xc1z\x7fI\xf0\xc3'    
vaddr =  4441805 
irsb  = pyvex.IRSB(self.data, self.vaddr, archinfo.ArchAMD64(), opt_level=2)

##corresponding asm decoded with capstone
'vmovdqu xmm0, xmmword ptr [rsi]',                                             
'vmovdqu xmm1, xmmword ptr [rcx - 0x10]',                                      
'vmovdqu xmmword ptr [rdi], xmm0',                                             
'vmovdqu xmmword ptr [r9 - 0x10], xmm1',                                       
'ret'

On your second example (question), LibVEX does not support AVX512 instructions yet. Related issue: angr/angr#1386

We plan to merge in an official patch that at least make LibVEX decode AVX512 instructions. But this is of a very low priority.

Why do you need every single register to have a name? The part that you're seeing is the high half of the ymm register whose xmm part is being zeroed. If you really want, I can add names like ymm0hx for this, but that just seems silly.

@rhelmot I've built my own analysis on top of pyvex. If the register isn't named then my analysis doesn't know which register to write the variable to.

I will take a look at the source code of pyvex and archinfo to better understand what is going on. Where/how is the offset being calculated and used?

@rhelmot Thanks! I'll look into the VEX register file abstraction.

I've already added the subregisters as described in archinfo/arch_amd64.py however I don't know how to translate between register 240 and it's name or size.

I added the registers in the linked commit and now your first example renders as expected. For your second example, valgrind does not yet support avx512, so vex won't either. There was a project a while back to add support, but the manpower behind it went away.

@rhelmot Unfortunately that's the problem. arch.translate_register_name does not resolve the register name and defaults to using str(offset). The x86_64 register name "240" obviously doesn't exist.

The VEX IR is trying to refer to bits 128:256 of the ymm0 register. I'm not sure if this has a valid name under x86_64 specification.

This is actually a problem with AVX2 not AVX512 and I've updated the issue title to correspond to this.

I added the registers in the linked commit and now your first example renders as expected. For your second example, valgrind does not yet support avx512, so vex won't either. There was a project a while back to add support, but the manpower behind it went away.

Perfect. The commit looks good!
Thanks.