RolfRolles/HexRaysDeob

the difference between Microcode Explorer output and optblock_t::func callback dump

TakahiroHaruyama opened this issue · 4 comments

I'm implementing control flow unflattening in more matured level, related to #7.

I like to debug the code by using Microcode Explorer graph but sometimes (especially in MMAT_GLBOPT1) the output generated by Microcode Explorer is different from optblock_t::func callback dump in the same maturity level (e.g.. dumpBefore-MMAT_GLBOPT1-0.txt), so I can't refer to the graph in debugging.

Do you know the reason?

No, I don't know what you're talking about.

Sorry, I attach one example graph of a function in MMAT_GLBOPT1 generated by Microcode Explorer.
As we can see, the control dispatcher block ID is 14.
screen shot 2019-01-31 at 14 51 48

On the other hand, according to the information dumped by optblock_t::func callback in the same level, the dispatcher ID is 9.

9. 0 ; 2WAY-BLOCK 9 INBOUNDS: 1 6 7 2 8 12 13 4 5 OUTBOUNDS: 10 14 [START=73F4211A END=73F42122] MINREFS: STK=24/ARG=128, MAXBSP: 0
9. 0 ; USE: edx.4
9. 0 ; VALRANGES: edx.4:(==251E6FCF|==6A786FA9|==A39DE200|==B8230B61|==D5FFDD16|==E0408B29|==E41FBF89|==F5AE3BEE)
9. 0 jle edx.4, #0xF5AE3BED.4, @14 ; 73F42120 u=edx.4
9. 0

So I'd like to know why there is a difference between them.

As it happens, I can sort of answer this question, but only because I've been reverse engineering Hex-Rays. This question isn't really related to an issue with the code I released for this project. It's pretty much just a generic question about Hex-Rays internals. You'll probably get better answers from Hex-Rays support.

Basically, the microcode explorer works by calling gen_microcode to produce an mbl_array_t at the specified maturity level, and then it stops immediately once that maturity level is reached. However, in ordinary operation of the decompiler, once the mbl_array_t has reached MMAT_GLBOPT1, Hex-Rays continues to optimize and transform the mbl_array_t before it reaches the next maturity level, MMAT_GLBOPT2.

In particular, in Hex-Rays 7.1, after reaching MMAT_GLBOPT1, the decompiler resolves stack variable addresses, refines the input arguments sizes, triggers block combination, performs common subexpression elimination, preallocates local variables, and does some other stuff that I haven't reverse engineered yet. Only after all of this is done does the decompiler update the maturity level of the mbl_array_t to MMAT_GLBOPT2. Your optblock_t handler is getting called somewhere after it reached MMAT_GLBOPT1, but before it has reached MMAT_GLBOPT2.

So, the reason the microcode explorer shows different results than something you dumped in an optblock_t handler is that, by the time your optblock_t handler is called, the mbl_array_t is not in the same state as it was when it originally reached MMAT_GLBOPT1 -- further transformations have taken place since reaching MMAT_GLBOPT1.

Thank you so much!
And I understood I should ask to the support about the internal issue.