Support instructions that clobber all registers and have non-fixed early defs
frank-emrich opened this issue · 6 comments
As part of bytecodealliance/wasmtime#9078, I added a new x64 MInst
called StackSwitchBasic
which clobbers all registers. In the process, I noticed the following: The MInst
requires two temporary registers, which I tried to get by adding two additional Writable<Gpr>
fields to StackSwitchBasic
and register them as reg_early_defs
in x64_get_operands
.
However, this makes register allocation always fail for StackSwitchBasic
with TooManyLiveRegs
.
As a workaround, I'm simply picking two unused registers at emission time , which is safe just because the instruction is known to clobber all registers. That means that this issue isn't really urgent, the workaround works as expected.
This commit on top of Wasmtime main shows a patch that makes the problem easily reproducible: It's undoing my workaround and restoring what I originally had in mind for obtaining the temporary regs. On that branch, running the cranelift filetests fails on stack_switch.clif
.
(Sorry for not having a standalone repro without relying on Wasmtime, but I've not used regalloc2 as a standalone library before)
As another workaround on top of the commit mentioned above, hard-coding the two temporary registers by setting them to fixed early defs also makes the problem go away. But I don't understand regalloc well enough to judge if that may reduce the quality of the generated code by imposing additional constraints.
What I'm seeing here may just be a variation of the problem already described in #145. Please let me know if it is, otherwise I'll cross reference this issue in my workaround in emit.rs
.
You are supposed to remove the set of defined registers from the set of clobbered registers. Regalloc2 has the invariant that each register is either unchanged, marked as defined or marked as clobbered.
@frank-emrich I think this may "just" be an issue with the way the allocation constraints are specified to regalloc2. My mental model for an instruction's semantics and use- and def- points is roughly:
- Early uses are read and early defs are written (in parallel, simultaneously).
- Clobbers occur.
- Late uses are read and late defs are written (in parallel, simultaneously).
All within a single instruction, so no spills/reloads or moves can be inserted between steps 1-2 or 2-3.
Given that, it seems like it should be a "too many live registers" error if we have a def at step 1, and step 2 clobbers (blocks) all registers in a class; there is no register remaining that can make it past the clobbers and carry the live value downward.
A solution might be to remove the registers that are def'd from the clobber set. The effect is the same anyway -- it blocks the register from being used by other values live across this instruction.
Thanks for your responses!
A solution might be to remove the registers that are def'd from the clobber set.
I understand how to do that if my early defs were fixed, but I'm not sure how to do it if they are non-fixed Reg
s, or if it's even possible in that case. After all, the clobber set is specified as a set of PReg
s. Does that simply mean that what I'm trying to do can only be achieved with fixed early defs, or am I missing something?
@frank-emrich I guess there are two options: use fixed early defs, or use unconstrained early defs and leave enough "holes" in the clobbers for them to fit. Otherwise the problem as specified to regalloc is impossible to solve because the clobber-set leaves no space for any live values in the requested register class.
[...] or use unconstrained early defs and leave enough "holes" in the clobbers for them to fit
Ah, I hadn't thought of that option. Thanks for clarifying!
In any case, closing this issue since regalloc is behaving correctly. I just didn't have a good enough mental model of how it works to realize that I was asking for the impossible.