MatthieuDartiailh/bytecode

Stack size with EXTENDED_ARG

Closed this issue · 10 comments

IPSW1 commented

Hi,
I stumbled upon a problem with the stack size computation in combination with the EXTENDED_ARG instruction. Take for example this code snipped:

p = [1, 2, 3, 4, 5, 6]
q, r, *s, t = p
print(q, r, s, t)
>>> 1 2 [3, 4, 5] 6

The disassembly looks like this:

1           0 LOAD_CONST               1 (1)
            2 LOAD_CONST               2 (2)
            4 LOAD_CONST               3 (3)
            6 LOAD_CONST               4 (4)
            8 LOAD_CONST               5 (5)
            10 LOAD_CONST              6 (6)
            12 BUILD_LIST              6
            14 STORE_FAST              1 (p)
                  
2          16 LOAD_FAST                1 (p)
           18 EXTENDED_ARG             1
           20 UNPACK_EX              258
           22 STORE_FAST               2 (q)
           24 STORE_FAST               3 (r)
           26 STORE_FAST               4 (s)
           28 STORE_FAST               5 (t)
                  
3          30 LOAD_GLOBAL              0 (print)
           32 LOAD_FAST                2 (q)
           34 LOAD_FAST                3 (r)
           36 LOAD_FAST                4 (s)
           38 LOAD_FAST                5 (t)
           40 CALL_FUNCTION            4
           42 POP_TOP
           44 LOAD_CONST               0 (None)
           46 RETURN_VALUE

Now generating the bytecode from this code and converting back to a code object fails due to wrong stack size computation, when the bytecode is generated with the parameter to include extended arguments here set to True (I specifically need this behavior for the bytecode to match the disassembly).

The problem seems to be the use of the EXTENDED_ARG instruction, which is not handled correctly. The stack size computation treats UNPACK_EX as a single instruction and takes into account only the lower byte (corresponding to the number of variables before the list value, i.e. variables q, r), but not the higher byte provided by EXTENDED_ARG (corresponding to the number of values after the list value, i.e. variable t). This leads to a negative stack size.

Is this a known problem and is there a solution or a workaround for this, except just excluding the EXTENDED_ARG instruction?

(For reference: UNPACK_EX documentation, stack effect of UNPACK_EX)

EDIT: Ok, an easy workaround would be to generate two bytecode objects, one without and one with the parameter for extended arguments. I can generate a code object from the one without EXTENDED_ARG and keep the other one for reference and comparison with the disassembly. Nevertheless, I don't think that is the nicest option here.

I will try to have a look either this week-end or next Wednesday night. Feel free to ping me if you do not hear from me.

IPSW1 commented

Alright, thank you. I tried to do a fix in this commit, which is meant to be as general as possible. It resolves the extended arguments and temporarily changes the instruction argument to the resolved value. It seems that the dis module is doing it in this way, which is why the stack effect is computed correctly for the above example.

I did not test any possible side effects. But I also don't think this will make the problem worse, since it does nothing if there are not extended arguments. However, there still remain problems when including EXTENDED_ARG instructions, for example when there are more than 256 names in a code object or if there are long jumps.

Can you open a PR with this commit ? Also if you can come up with a bunch of failing unit test it would be helpful to help improve support for EXTENDED_ARG.

IPSW1 commented

I did here #64. Was just worried about my code quality, that's why I hesitated :)

There are also three added test cases, but only one of them is expected to fail. It seems like this has something to do with a split of huge block and a wrong start size for one block, but I am not entirely sure.

EDIT: Looks like this test case only fails for versions from 3.6.

I finally found some time to look at this in details. My conclusion is that the issue is that EXTENDED_ARG are propagated to higher level representation of the bytecode namely Bytecode and ControlFlowGraph which should not happen. EXTENDED_ARG is a low-level detail and should be ignored in those high level representation.

I will work to fix the ConcreteBytecode.to_bytecode method to eliminate all EXTENDED_ARG if any is present which should address all issues related to them. For symmetry I will try to add an extended_args to Bytecode.to_concrete_bytecode to allow to generate a ConcreteBytecode with EXTENDED_ARG.

If you have any example that uses EXTENDED_ARG and that fails when doing Bytecode.from_code(code).to_code() please let me know because that would be a different kind of bug and not fixed by the above.

I just opened #65 that addresses your issues. All the tests you included in #64 now pass.

@IPSW1 I noticed you closed #64. Can you confirm that #65 addresses all your issues ?

IPSW1 commented

TL;DR: Yes, it does.

The thing is I just realized that I wanted to have the benefits of both the concrete and the abstract world, i.e. the EXTENDED_ARG and also the convenience with abstraction. As you described in #65, this would lead to possibly a lot of problems regarding stack size computation (for which my bad hack in #64 is a good example).

So I think your proposed solution is completely reasonable for the issues.

@IPSW1 Do you need a new release now that #65 has been merged ?

IPSW1 commented

There is no need for a specific release for me.

Thank you for the fix!