MatthieuDartiailh/bytecode

`ValueError: ('TryBegin target must a BasicBlock, got %s', 'Label')` when using `Label` with `TryBegin`

Opened this issue · 14 comments

I am trying to use a TryBegin with a Label. According to the documentation, the target argument can either be a Label or a BasicBlock. However, when using a Label I get

ValueError: ('TryBegin target must a BasicBlock, got %s', 'Label')

This is with bytecode 0.15.0.

It depends if the TryBegin within a Bytecode object (use a Label) or within a Controlflowgraph (use a BasicBlock). Hope this helps.

I'm using it in an abstract Bytecode object in this case

Can you post a minimal reproducer ?

I'll try to make one as this comes from a rather convoluted bit of code. In the meantime, this is the traceback from pytest

.riot/venv_py3114/lib/python3.11/site-packages/bytecode/bytecode.py:306: in to_code
    stacksize = cfg.compute_stacksize(
.riot/venv_py3114/lib/python3.11/site-packages/bytecode/cfg.py:547: in compute_stacksize
    args = coro.send(None)  # type: ignore
.riot/venv_py3114/lib/python3.11/site-packages/bytecode/cfg.py:280: in run
    for i, instr in enumerate(self.block):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = [<RETURN_GENERATOR location=InstrLocation(lineno=232, end_lineno=None, col_offset=None, end_col_offset=None)>, <POP_TO...>, <LOAD_FAST arg='__ddgen' location=InstrLocation(lineno=232, end_lineno=None, col_offset=None, end_col_offset=None)>]

    def __iter__(self) -> Iterator[Union[Instr, SetLineno, TryBegin, TryEnd]]:
        index = 0
        while index < len(self):
            instr = self[index]
            index += 1
    
            if not isinstance(instr, (SetLineno, Instr, TryBegin, TryEnd)):
                raise ValueError(
                    "BasicBlock must only contain SetLineno and Instr objects, "
                    "but %s was found" % instr.__class__.__name__
                )
    
            if isinstance(instr, Instr) and instr.has_jump():
                if index < len(self) and any(
                    isinstance(self[i], Instr) for i in range(index, len(self))
                ):
                    raise ValueError(
                        "Only the last instruction of a basic " "block can be a jump"
                    )
    
                if not isinstance(instr.arg, BasicBlock):
                    raise ValueError(
                        "Jump target must a BasicBlock, got %s",
                        type(instr.arg).__name__,
                    )
    
            if isinstance(instr, TryBegin):
                if not isinstance(instr.target, BasicBlock):
>                   raise ValueError(
                        "TryBegin target must a BasicBlock, got %s",
                        type(instr.target).__name__,
                    )
E                   ValueError: ('TryBegin target must a BasicBlock, got %s', 'Label')

.riot/venv_py3114/lib/python3.11/site-packages/bytecode/cfg.py:71: ValueError

It looks like we cross over to cfg when computing the stack size.

Yes stack size computation can only be done on a cfg. I am curious as to why we fail to replace the label. I will really need a reproducer to investigate this.

This is the smallest I've been able to get so far

import sys
from types import FunctionType

from bytecode import BinaryOp, Bytecode, CompilerFlags, Instr, Label


def _update_map_block(varkwargsname, lineno):
    return [
        Instr("COPY", 1, lineno=lineno),
        Instr("LOAD_METHOD", "update", lineno=lineno),
        Instr("LOAD_FAST", varkwargsname, lineno=lineno),
        Instr("PRECALL", 1, lineno=lineno),
        Instr("CALL", 1, lineno=lineno),
        Instr("POP_TOP", lineno=lineno),
    ]


def _call_return_block(arg, lineno):
    return [
        Instr("PRECALL", arg, lineno=lineno),
        Instr("CALL", arg, lineno=lineno),
        Instr("RETURN_VALUE", lineno=lineno),
    ]


FIRSTLINENO_OFFSET = 1


def wrap_generator(instrs, code, lineno):
    from bytecode import TryBegin, TryEnd

    instrs[0:0] = [
        Instr("RETURN_GENERATOR", lineno=lineno),
        Instr("POP_TOP", lineno=lineno),
    ]

    stopiter = Label()
    loop = Label()
    genexit = Label()
    exc = Label()
    propagate = Label()
    _yield = Label()

    try_stopiter = TryBegin(stopiter, push_lasti=False)
    try_except = TryBegin(genexit, push_lasti=True)

    instrs[-1:] = [
        try_stopiter,
        Instr("COPY", 1, lineno=lineno),
        Instr("STORE_FAST", "__ddgen", lineno=lineno),
        Instr("LOAD_ATTR", "send", lineno=lineno),
        Instr("STORE_FAST", "__ddgensend", lineno=lineno),
        Instr("LOAD_CONST", next, lineno=lineno),
        Instr("LOAD_FAST", "__ddgen", lineno=lineno),
        loop,
        Instr("PRECALL", 1, lineno=lineno),
        Instr("CALL", 1, lineno=lineno),
        TryEnd(try_stopiter),
        _yield,
        try_except,
        Instr("YIELD_VALUE", lineno=lineno),
        Instr("RESUME", 1, lineno=lineno),
        Instr("PUSH_NULL", lineno=lineno),
        Instr("SWAP", 2, lineno=lineno),
        Instr("LOAD_FAST", "__ddgensend", lineno=lineno),
        Instr("SWAP", 2, lineno=lineno),
        Instr("JUMP_BACKWARD", loop, lineno=lineno),
        TryEnd(try_except),
        try_stopiter,
        genexit,  # except GeneratorExit:
        Instr("PUSH_EXC_INFO", lineno=lineno),
        Instr("LOAD_CONST", GeneratorExit, lineno=lineno),
        Instr("CHECK_EXC_MATCH", lineno=lineno),
        Instr("POP_JUMP_FORWARD_IF_FALSE", exc, lineno=lineno),
        Instr("POP_TOP", lineno=lineno),
        Instr("LOAD_FAST", "__ddgen", lineno=lineno),
        Instr("LOAD_METHOD", "close", lineno=lineno),
        Instr("PRECALL", 0, lineno=lineno),
        Instr("CALL", 0, lineno=lineno),
        Instr("SWAP", 2, lineno=lineno),
        Instr("POP_EXCEPT", lineno=lineno),
        Instr("RETURN_VALUE", lineno=lineno),
        exc,  # except:
        Instr("POP_TOP", lineno=lineno),
        Instr("LOAD_FAST", "__ddgen", lineno=lineno),
        Instr("LOAD_ATTR", "throw", lineno=lineno),
        Instr("PUSH_NULL", lineno=lineno),
        Instr("LOAD_CONST", sys.exc_info, lineno=lineno),
        Instr("PRECALL", 0, lineno=lineno),
        Instr("CALL", 0, lineno=lineno),
        Instr("CALL_FUNCTION_EX", 0, lineno=lineno),
        Instr("STORE_FAST", "__value", lineno=lineno),
        Instr("POP_EXCEPT", lineno=lineno),
        Instr("LOAD_FAST", "__value", lineno=lineno),
        Instr("JUMP_BACKWARD", _yield, lineno=lineno),
        TryEnd(try_stopiter),
        stopiter,  # except StopIteration:
        Instr("PUSH_EXC_INFO", lineno=lineno),
        Instr("LOAD_CONST", StopIteration, lineno=lineno),
        Instr("CHECK_EXC_MATCH", lineno=lineno),
        Instr("POP_JUMP_FORWARD_IF_FALSE", propagate, lineno=lineno),
        Instr("POP_TOP", lineno=lineno),
        Instr("POP_EXCEPT", lineno=lineno),
        Instr("LOAD_CONST", None, lineno=lineno),
        Instr("RETURN_VALUE", lineno=lineno),
        propagate,
        Instr("RERAISE", 0, lineno=lineno),
    ]


def wrap_bytecode(wrapper, wrapped):
    code = wrapped.__code__
    lineno = code.co_firstlineno + FIRSTLINENO_OFFSET
    varargs = bool(code.co_flags & CompilerFlags.VARARGS)
    varkwargs = bool(code.co_flags & CompilerFlags.VARKEYWORDS)
    nargs = code.co_argcount
    argnames = code.co_varnames[:nargs]
    try:
        kwonlyargs = code.co_kwonlyargcount
    except AttributeError:
        kwonlyargs = 0
    kwonlyargnames = code.co_varnames[nargs : nargs + kwonlyargs]
    varargsname = code.co_varnames[nargs + kwonlyargs] if varargs else None
    varkwargsname = (
        code.co_varnames[nargs + kwonlyargs + varargs] if varkwargs else None
    )

    instrs = [
        Instr("RESUME", 0, lineno=lineno - 1),
        Instr("PUSH_NULL", lineno=lineno),
        Instr("LOAD_CONST", wrapper, lineno=lineno),
        Instr("LOAD_CONST", wrapped, lineno=lineno),
    ]

    # Build the tuple of all the positional arguments
    if nargs:
        instrs.extend(
            [Instr("LOAD_FAST", argname, lineno=lineno) for argname in argnames]
        )
        instrs.append(Instr("BUILD_TUPLE", nargs, lineno=lineno))
        if varargs:
            instrs.extend(
                [
                    Instr("LOAD_FAST", varargsname, lineno=lineno),
                    Instr("BINARY_OP", BinaryOp.ADD, lineno=lineno),
                ]
            )
    elif varargs:
        instrs.append(Instr("LOAD_FAST", varargsname, lineno=lineno))
    else:
        instrs.append(Instr("BUILD_TUPLE", 0, lineno=lineno))

    # Prepare the keyword arguments
    if kwonlyargs:
        for arg in kwonlyargnames:
            instrs.extend(
                [
                    Instr("LOAD_CONST", arg, lineno=lineno),
                    Instr("LOAD_FAST", arg, lineno=lineno),
                ]
            )
        instrs.append(Instr("BUILD_MAP", kwonlyargs, lineno=lineno))
        if varkwargs:
            instrs.extend(_update_map_block(varkwargsname, lineno))

    elif varkwargs:
        instrs.append(Instr("LOAD_FAST", varkwargsname, lineno=lineno))

    else:
        instrs.append(Instr("BUILD_MAP", 0, lineno=lineno))

    instrs.extend(_call_return_block(3, lineno))

    if CompilerFlags.GENERATOR & code.co_flags and not (
        CompilerFlags.COROUTINE & code.co_flags
    ):
        wrap_generator(instrs, code, lineno)

    return Bytecode(instrs)


def wrap(f, wrapper):
    wrapped = FunctionType(
        f.__code__,
        f.__globals__,
        "<wrapped>",
        f.__defaults__,
        f.__closure__,
    )

    wrapped.__kwdefaults__ = f.__kwdefaults__

    code = wrap_bytecode(wrapper, wrapped)
    code.freevars = f.__code__.co_freevars
    code.name = f.__code__.co_name
    code.filename = f.__code__.co_filename
    code.flags = f.__code__.co_flags
    code.argcount = f.__code__.co_argcount
    try:
        code.posonlyargcount = f.__code__.co_posonlyargcount
    except AttributeError:
        pass

    nargs = code.argcount
    try:
        code.kwonlyargcount = f.__code__.co_kwonlyargcount
        nargs += code.kwonlyargcount
    except AttributeError:
        pass
    nargs += bool(code.flags & CompilerFlags.VARARGS) + bool(
        code.flags & CompilerFlags.VARKEYWORDS
    )
    code.argnames = f.__code__.co_varnames[:nargs]

    f.__code__ = code.to_code()

    return f


def test_wrap_generator():
    def wrapper(f, args, kwargs):
        for _ in f(*args, **kwargs):
            yield _

    def g():
        for _ in range(10):
            yield _
        return

    wrap(g, wrapper)


test_wrap_generator()
Traceback (most recent call last):
  File "/private/tmp/bc.py", line 233, in <module>
    test_wrap_generator()
  File "/private/tmp/bc.py", line 230, in test_wrap_generator
    wrap(g, wrapper)
  File "/private/tmp/bc.py", line 215, in wrap
    f.__code__ = code.to_code()
                 ^^^^^^^^^^^^^^
  File "/private/tmp/bc/lib/python3.11/site-packages/bytecode/bytecode.py", line 306, in to_code
    stacksize = cfg.compute_stacksize(
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/bc/lib/python3.11/site-packages/bytecode/cfg.py", line 547, in compute_stacksize
    args = coro.send(None)  # type: ignore
           ^^^^^^^^^^^^^^^
  File "/private/tmp/bc/lib/python3.11/site-packages/bytecode/cfg.py", line 280, in run
    for i, instr in enumerate(self.block):
  File "/private/tmp/bc/lib/python3.11/site-packages/bytecode/cfg.py", line 71, in __iter__
    raise ValueError(
ValueError: ('TryBegin target must a BasicBlock, got %s', 'Label')

After looking at the code in ControlFlowGraph.from_code, I tried creating a new instance of TryBegin from the same label, instead of reusing the same instance in two distinct places, and now I get

E           RuntimeError: Failed to compute stacksize, got negative size

which might be due to my list of instructions. I presume reusing TryBegin objects pointing at the same Label is not supported then?

If memory serves, you need TryBegin to be unique so that each TryEnd can refer unambiguously to the TryBegin it refers to.

In bytecode, we could rely on the fact that TryBegin can not be nested, but at the CFG level it becomes harder due to the fact that a TryBegin may have multiple ends associated with different basic blocks.

A documentation clarification may be worthwhile.

Yep, some documentation notes on when to use Label and BasicBlock would be nice, as well as a note on not to re-use a TryBegin. Knowing that TryBegin and TryEnd can't be nested, I thought it was OK to re-use one, as I was trying to implement nested try blocks.

We could try to lift the requirement around TryBegin by updating the conversion code. Similarly to #128, I hope to look at this within the next month.

We could try to lift the requirement around TryBegin

The uniqueness or the nesting one? I think if we could lift the nesting requirement then the other would come for free. However, not being too familiar with the code, I'm not sure how hard it would be to turn nested intervals into a disjoint union of intervals, for which the non-nesting property would hold.

I mean the uniqueness one. The nesting one comes from CPython exception table and I am not keen on diverging too much from it.

I see. My idea was to have TryBegin and TryEnd implement a higher-level API around the exception table. For example, the nesting

tb1 = TryBegin(label1)
tb2 = TryBegin(label2)

Bytecode([
    tb1,
    ...
    tb2,
    ...
    TryEnd(tb2),
    ...
    TryEnd(tb1),
]

would be translated into 3 (non-nested, non-overlapping) exception table entries.

I understand your feature request but I do not have the bandwidth to see how to address it at the moment. I will try to revisit it later.