EmbarkStudios/rust-gpu

We should override `thir_body` to inject loop merge points for structurization.

eddyb opened this issue · 1 comments

eddyb commented

I was writing up this SPIR-T issue when I realized we can do much better than I ever thought we can:

The "subgroup reconvergence"/"loop merges" problem statement

the fundamental ambiguity here (wrt reconvergence), that explicit merges solve, looks like this:

loop {
    // ...
    if cond {  // the "then" edge *leaves the loop*
        A();   // !!! it might be important that this STAYS INSIDE the loop!
        break; // this is just a linear A -> B edge in the CFG (outside the loop's cycle)
    }
}
B(); // !!! it might be important that this STAYS OUTSIDE the loop!

because the A() -> B() edge just looks like a redundant fusable edge (isomorphic to a single A(); B(); basic block), structurizers will very likely either produce:

  • loop { ... if cond { break; } } A(); B(); (moving A() out of the loop)
    • A() no longer called for "early finishers" (losing a desired side-effect)
  • loop { ... if cond { A(); B(); break; } } (moving B() into the loop)
    • B() now being called for "early finishers" (gaining an undesired side-effect)

and they're both bad because a non-uniform break (i.e. an "early finisher") will block on subgroup neighbors (semantically, at least, in hardware the now-inactive lanes will likely stick around while loop instructions continue being executed for remaining lanes, until they all eventually break)

An actual reasonable solution for once (what prompted me to open this issue)

A better approach might be to "just" make up a SPIR-T instruction that acts as "loop merge barrier", i.e.:

loop {
    // ...
    if cond {
        A();   // !!! definitely INSIDE the loop
        break; // just an edge to the `LoopMerge`, in the CFG
    }
}
asm!("spirt.ControlInst.LoopMerge");
B(); // !!! definitely OUTSIDE the loop

Each LoopMerge would be consumed by one loop structurization - if you have nested loops, you'd need to be careful not to forget to add them to each loop.

But wait, even if we make this ergonomic with macros... we'd still want to check that you don't use any instructions that "care" about e.g. subgroups, right? That could be e.g. a MIR check pass, right? Which we can inject... just like...

Yes, that's right, rustc_codegen_spirv has enough query overriding power that it could override thir_body, which MIR construction uses as the source of truth, to introduce additional nodes around loops, allowing Rust to be on par with GLSL wrt structured control-flow guarantees.


Additionally, this could be used to replace #[spirv(unroll_loops)] (once that's removed, see #940 (comment) for more context), as it would allow us to check for a (new) #[spirv(unroll)] attribute on the loop expression, and then store that in the injected merge point.

That would be much nicer than fn-level catch-alls, and generally a good fit for SPIR-V expressivity.

eddyb commented

There might be a simpler way to do this, by placing, just before every break:

if black_box(false) { // or rather, some `asm!` equivalent
    // This will never be reached, but the potential to continue looping
    // would keep the `break`-ing block considered to be part of the loop.
    continue;
}

(with the condition being replaced by false only after SPIR-T structurization)

This does require SPIR-T structurization to do "minimal loops", but that's already the plan, given how unusual/unexpected/unwanted the "maximal loops" naturally produced by the structurization algorithom, are.

The only serious caveat I potentially foresee is this could impact borrow/move checking, and cause Rust errors that wouldn't be there if the loop was treated as "definitely being exited" from those breaks.