sym_load_guest_internal does not distinguish signed and unsigned values
Opened this issue · 1 comments
We found that the movsx
assembly instruction is not correctly translated by syqemu. More specifically, the movsx
istruction copies the contents of the source operand to the destination operand with sign extension, however, the translated tcg ops conduct a zero extension
, as shown below:
// g_26 is an int8_t variable
// the value of g_26 is 0xae
0x40126D: movsx r9d, g_26
// will be translated into -->
---- 000000000040126d 0000000000000018
movi_i64 tmp2_expr,$0x0
movi_i64 tmp2,$0x404044
qemu_ld_i64 tmp0,tmp2,sb,0
movi_i64 tmp12_expr,$0x0
movi_i64 tmp12,$0x0
movi_i64 tmp13_expr,$0x0
movi_i64 tmp13,$0x1
call sym_load_guest_i64,$0x1,$1,tmp0_expr,env,tmp2,tmp2_expr,tmp13,tmp12
movi_i64 tmp12_expr,$0x0
movi_i64 tmp12,$0x4
call sym_zext,$0x5,$1,r9_expr,tmp0_expr,tmp12
ext32u_i64 r9,tmp0
The sym_zext
and ext32u_i64
at the end is inconsitent with the semantic of movsx instruction and could lead to inaccurate symbolic constraints.
After cheking the source code of symqemu, we found that the sym_load_guest_internal
function does not distinguish between the signed value and the unsigned value and it will zero extend the loaded value whenever the load_length is not equal to result_length. This behavior is unusual and potentially buggy.
One possible fix could be to:
- modify
tcg_gen_qemu_ld_i{32, 64}
replacingload_size
withmemop
(which encodes the length and the sign). - modify
sym_load_guest_i{32, 64}
to propagatemmop
tosym_load_guest_internal
- modify
sym_load_guest_internal
in order to retrieve frommmop
both theload_size
andsign
- modify
sym_load_guest_internal
to perform a sign extension when needed
Let me know if I can one PR along this direction.