riscv-non-isa/riscv-trace-spec

Inferable vs non-inferable jumps needs to a lot more detail

tomverbeure opened this issue · 5 comments

In earlier chapters, the spec is narrow about uninferable jumps in the sense that it doesn't mention constant registers.

In the ingressPort chapter, it says that indirect jumps with constant-loaded registers are also an inferable jump. That makes sense.

But since it's the core (through itype), not the encoder, that determines which jump is inferable and which one is not, this needs to be very precisely specified. Because the decoder will need to be ready to understand constant registers load exactly the same way the core does.

Right now, this is not the case.

The spec needs details about what is considered a constant register load.

  • Is it only for cases LUI and AUIPC instructions? Load immediate instructions should be considered as well. I assume that operations on constant loaded registers (shift-immediate, add/subtract-constant) are excluded, even though they could qualify...
  • Are there restrictions on how far away the LUI and AUIPC instructions are located from the jump instruction?
  • What kind of barriers will invalidate constant loaded registers? Are inferably jumps out? Branch instructions should be fine in theory, except when there are multiple non-taken branches in the same instruction block.

I expect that the answer to all these points will be pretty straightforward: some simple 'constant' status bit per register inside the core that gets set to 1 in case of an AUIPC/LUI/load-immediate, and that gets cleared for any operations, branch, or jump.

But it needs to be specified.

Tom

[Iain] I think it should only be AUIPC, LUI and C.LUI. This is the mechanism built into the ISA for forming long distance jumps. I don’t believe it is worth the additional complexity in terms of design and verification or the CPU, the encoder and the decoder to support other types of constant load, as I don’t believe they will occur often enough to make a meaningful difference to the encoder efficiency.

Sounds good!

What about the location of AUIPC/LUI/C.LUI?

Shall a jump only be considered inferable when these 3 instructions are located right before the jump (as, I think, is currently assumed in the reference C decoder) ?

I'm fine with that restriction as well, since it's pretty much a standard instruction combo in the ISA spec.

Furthermore, the interface between CPU and encoder will need to be considerably more complex to support the general case of any constant-load. It will require all the state about which register contains a constant, and which register is used for the jump target to be visible to the encoder.

I don't think the interface would need to be more complex, since it's the core that determines which register is still constant and which one is not, and the encoder simply takes that at face value? But the decoder would need to have a similar complexity.

Either way, the point is moot.

Tom

[Iain] No, I think it’s quite likely that compilers will pull the AUIPC earlier so that the register contents are available for use by the jump immediately. If the AUIPC is right before the jump there will be a stall as the jump is dependent on the AUIPC completing, and the pipelined nature of virtually any implementation will mean this will take several cycles.

I don't think this is something to worry about because the AUIPC/JALR combo was listed as a macro-op fusion candidate in the 2016 macro-op fusion presentation and because the RISC-V ISA spec hints to that as well (see footnote below table 2.1 on page 17 of the 2.2 version.)

But it's obviously not a big deal to support cases where they are not back-to-back.

Tom

Resolved