Support branch instructions that define their blockparams
bjorn3 opened this issue · 10 comments
This would allow an invoke
instruction to define a virtual register representing the return value and then pass this as as blockparam to the regular return block. And similarly define a virtual register representing the landingpad argument and pass it as blockparam to the block for the landingpad.
@bjorn3 and I were discussing this just now; to add a little more background:
invoke
encapsulates both the call itself, and the (virtual) branch that is implied by the potential unwinding control flow. Separating them is problematic because the regalloc is allowed to insert arbitrary moves between instructions and we can't have any between the call and the "branch": if an unwind happens, it will not execute these moves.- modeling the regular (non-exceptional) return values as blockparams for the non-exceptional continuation makes sense because semantically the results only exist/are defined in this case: on unwind, the function hasn't actually returned values.
A thought does occur to me though: if we relax the second point above, this may become a bit simpler. Working backward from what we want to happen in the "backend" of regalloc: we want to allocate new register(s) for the call results; we want their liverange to start with the underlying call instruction. This is a little different from blockparams, which logically start at the successor block, so we have a little "stub" of a liverange just over the invoke/branch instruction anyway. What if we take that further, relax the results-as-blockparams, and do something like:
v1 = invoke ..., regular block1(...), exceptional block2(...)
block1(...): ; regular-return (fallthrough) case
(use v1)
block2(...): ; exceptional case, reached implicitly via unwind
(don't use v1, it won't have been written)
basically, the idea is to construct the RA2 problem as a conservative superset of the required allocations: the invoke results are allocated at the call, and in-scope in all blocks they dominate as normal, including the normal return path and exceptional return path; the fact that the call will not have written the result on unwind is a fact at a higher semantic level that we respect (by not reading the register on that path) in the CLIF.
The remaining question is then: can branches def values? I can't think of a specific case or check that would break this, but there may be some subtle interaction. In particular I believe that edge-moves are not a problem as long as invoke has two targets: this will force edge-moves to be placed in the successor blocks' heads rather than just before the branch. (The edge-moves come logically after the def so it would be a problem otherwise.)
Others have any thoughts?
In the unwind case there are also return values. These may be of different types than for the regular-return case, but generally do occupy the same registers. These need to be supported as they are often used to pass around the exception object itself.
This actually already works in regalloc2 without any changes needed. First, we must differentiate between 2 types of branches:
branch
terminators can have multiple targets but cannot have blockparams.jump
terminators can only have 1 target and cannot have operands, but can have blockparams.
Notably, it's possible for branch
terminators to have def operands. Any moves associated with these would be placed in the corresponding successor blocks.
You can build an invoke
as a branch
terminator which also defs 2 output values in fixed registers: the normal return value and the exception pointer. The normal return value is only used in the normal successor block while the exception pointer is only used in the landing pad.
The last commit in #170 extends the fuzzer to test this case.
Neat, that's I think essentially what I describe above except two defs (exception ptr as well). Hopefully a fairly straightforward change on the producer side!
You can build an invoke as a branch terminator which also defs 2 output values in fixed registers: the normal return value and the exception pointer.
Both the normal and exception return cases can have a variable amount of return values. For exception returns it is up to two registers. I guess one way to handle this would be to look at the signature of the called function to know how many normal return values there are and treat the rest as exception return values.
You can build an invoke as a branch terminator
How can I lower the block calls on the clif ir side to something that avoids blockparams on the regalloc side?
I think a reasonable design might be:
-
An
invoke
of a function with N return types(T1, T2, ...)
and M exceptional returns(Exc1, Exc2, ...)
always has a result tuple ofN+M
values:(T1, T2, ..., Exc1, Exc2, ...)
. The values of the former are undefined on exception, and the latter are undefined on normal return. If we wanted to keep out undefined behavior in the CLIF semantics, we could add a verifier rule that the values are only used in blocks dominated by the appropriate edge.(if any pass that examines the
invoke
needs to know which are which, it can look at the signature of the function to knowN
, then getM
from the number of total results) -
The block-calls in CLIF are still lowered to blockparams provided by this "branch" as seen by regalloc; the only thing is that the function returns and exceptional values are not additional blockparams.
So to add an example, if we have
fn0 = %f(i32, i32) -> (i64, i64)
block0(v1: i32, v2: i32):
v3, v4, v5, v6 = invoke %f(v1, v2), block1(v1), block2(v2) ;; v3, v4 are normal returns; v5, v6 are exceptional returns
block1(v7: 32): ;; v7 is a normal blockparam
;; use v3, v4
block2(v8: i32): ;; v8 is a normal blockparam
;; use v5, v6
then the regalloc metadata for the invoke is:
v1
,v2
are early-uses (constrained per ABI)v3
,v4
,v5
,v6
are late-defs (constrained per ABI)is_branch
= true, two successors, blockparams of[[v1], [v2]]
Cross posting bytecodealliance/rfcs#36 here because I think both discussions should be aware of each other.
Random notes on my preferences for the clif design side of things, since it is relevant to the discussion here about how we would lower/translate this to regalloc2 input:
-
I know this is super nitpicky, but I pretty strongly prefer
try_call
toinvoke
as far as naming the clif instruction goes, since the latter really doesn't imply anything to do with fallibility other than it happens to be the same name that LLVM chose. -
I think we want to annotate landing pad blocks in clif so that the verifier can easily check that normal control doesn't flow to them. This is the idea behind the
catch
prefix (similar to thecold
prefix) for blocks that is described in the linked RFC. Not sure if this makes sense for regalloc input as well or not; not clear to me that regalloc would want or need to treat such blocks differently or not. -
An
invoke
of a function with N return types(T1, T2, ...)
and M exceptional returns(Exc1, Exc2, ...)
always has a result tuple ofN+M
values:(T1, T2, ..., Exc1, Exc2, ...)
.To be clear, this would only be done in the regalloc2 side of things, right? That is, the clif would still terminate a block with a
try_call
and then the successor blocks would have block params? I feel pretty strongly that we want to have that shape of clif, and not rely on the verifier to check that the undefined values are not used in successor blocks that don't match the exceptionality of the return. I don't have strong feelings about the shape of the regalloc input tho. -
Both the normal and exception return cases can have a variable amount of return values. For exception returns it is up to two registers.
Is there any reason we can't start by simply fixing the amount of exceptional return values to exactly one or exactly two (whatever is sufficient for both wasm exceptions and clif's panic unwinding)? I feel like variable exceptional returns is a complication that we should be able to ignore for now. Of course, if this doesn't actually make anything simpler on the regalloc side of things, no need to artificially build that constraint in.
I think we want to annotate landing pad blocks in clif so that the verifier can easily check that normal control doesn't flow to them.
There is no fundamental reason why this is necessary. In fact for wasm lowering it may be beneficial if this restriction doesn't exist sich that throwing an exception can be done by directly jumping to the cleanup block while still sharing this block with the unwind target for calls. With this restriction in place wasm lowering would need to add an extra block whose sole purpose is to be marked as landingpad and jump to the real cleanup block.
Edit: It may be necessary for another block to be inserted in between when lowering to vcode for any register moves, but that has to be done anyway when two calls share a landingpad.
An
invoke
of a function with N return types(T1, T2, ...)
and M exceptional returns(Exc1, Exc2, ...)
always has a result tuple ofN+M
values:(T1, T2, ..., Exc1, Exc2, ...)
.To be clear, this would only be done in the regalloc2 side of things, right? That is, the clif would still terminate a block with a
try_call
and then the successor blocks would have block params? I feel pretty strongly that we want to have that shape of clif, and not rely on the verifier to check that the undefined values are not used in successor blocks that don't match the exceptionality of the return. I don't have strong feelings about the shape of the regalloc input tho.
The context of this discussion is how to present the instruction to RA2, so yes, my thinking is mostly from that side. That said, I think this issue is a little more subtle than it may seem; will write up some comments over in the proposal (I hadn't had a chance to read it when posting the above as it had just appeared!).