eurecom-s3/symqemu

sym_load_guest_internal does not distinguish signed and unsigned values

Opened this issue · 1 comments

We found that the movsx assembly instruction is not correctly translated by syqemu. More specifically, the movsx istruction copies the contents of the source operand to the destination operand with sign extension, however, the translated tcg ops conduct a zero extension, as shown below:

// g_26 is an int8_t variable 
// the value of g_26 is 0xae
0x40126D: movsx   r9d, g_26
// will be translated into -->
 ---- 000000000040126d 0000000000000018
 movi_i64 tmp2_expr,$0x0
 movi_i64 tmp2,$0x404044
 qemu_ld_i64 tmp0,tmp2,sb,0
 movi_i64 tmp12_expr,$0x0
 movi_i64 tmp12,$0x0
 movi_i64 tmp13_expr,$0x0
 movi_i64 tmp13,$0x1
 call sym_load_guest_i64,$0x1,$1,tmp0_expr,env,tmp2,tmp2_expr,tmp13,tmp12
 movi_i64 tmp12_expr,$0x0
 movi_i64 tmp12,$0x4
 call sym_zext,$0x5,$1,r9_expr,tmp0_expr,tmp12
 ext32u_i64 r9,tmp0

The sym_zext and ext32u_i64 at the end is inconsitent with the semantic of movsx instruction and could lead to inaccurate symbolic constraints.

After cheking the source code of symqemu, we found that the sym_load_guest_internal function does not distinguish between the signed value and the unsigned value and it will zero extend the loaded value whenever the load_length is not equal to result_length. This behavior is unusual and potentially buggy.

One possible fix could be to:

  1. modify tcg_gen_qemu_ld_i{32, 64} replacing load_size with memop (which encodes the length and the sign).
  2. modify sym_load_guest_i{32, 64} to propagate mmop to sym_load_guest_internal
  3. modify sym_load_guest_internal in order to retrieve from mmop both the load_size and sign
  4. modify sym_load_guest_internal to perform a sign extension when needed

Let me know if I can one PR along this direction.