eurecom-s3/symqemu

Symbolization of QEMU helpers with SymCC

ercoppa opened this issue · 1 comments

SymQEMU ignores the effects of most QEMU helpers. Some of them, especially on i386/x86_64, are quite common when analyzing real-world programs. Manually writing symbolic helpers is quite hard in several most cases. Hence, another approach could be:

  1. Build a dynamic library (e.g., libsymhelpers.so) containing the code of the QEMU helpers instrumented with SymCC.

  2. Modify the build configuration of QEMU to (optionally) link this library`

  3. If the library is found at configure time, then a macro CONFIG_SYM_HELPERS is set, which enables a few changes in, e.g., target/i386/translate.c. For instance:

    • Helpers that only read/write XMM registers: we can just make a call to our symbolized version since each XMM register is modeled by QEMU as a buffer in memory and thus SymCC can naturally cope with the accesses to these buffers. The arguments of the helper will be pointers to the buffers, hence, before we call our symbolized helper, we still have to call another helper that concretizes the arguments: it should call _sym_set_parameter_expression(N, NULL).

    • Helpers that also read/write general-purpose registers: the idea is pretty much the same with the exception that (a) before calling our symbolized helper we have to call a helper that calls _sym_set_parameter_expression(N, expr) to propagate the expressions of the symbolic TCG temps to the symbolic arguments, (b) after the call, we have to call a helper that retrieves the symbolic return expression with _sym_get_return_expression() and propagates it to the output TCG temp that should contain the resulting symbolic expression.

If the helper has an output value we have to skip the concretization performed by tcg_gen_callN.

What do you think?

We already have a PoC of this strategy in one fork of SymQEMU that we can show. However, before making a PR, I believe it makes to see if this is an approach that we actually want to consider since there are a few downsides (besides the changes in translate.c, we also have to tinker with the build workflow since our library requires a few headers generated during the QEMU build process).

Notice that this proposal may help fix #13 (I tested it), #12 (I am not sure), #6 (It can for sure improve the situation).