Inferable vs non-inferable jumps needs to a lot more detail

Question

Inferable vs non-inferable jumps needs to a lot more detail

tomverbeure opened this issue 5 years ago · 5 comments

In earlier chapters, the spec is narrow about uninferable jumps in the sense that it doesn't mention constant registers.

In the ingressPort chapter, it says that indirect jumps with constant-loaded registers are also an inferable jump. That makes sense.

But since it's the core (through itype), not the encoder, that determines which jump is inferable and which one is not, this needs to be very precisely specified. Because the decoder will need to be ready to understand constant registers load exactly the same way the core does.

Right now, this is not the case.

The spec needs details about what is considered a constant register load.

Is it only for cases LUI and AUIPC instructions? Load immediate instructions should be considered as well. I assume that operations on constant loaded registers (shift-immediate, add/subtract-constant) are excluded, even though they could qualify...
Are there restrictions on how far away the LUI and AUIPC instructions are located from the jump instruction?
What kind of barriers will invalidate constant loaded registers? Are inferably jumps out? Branch instructions should be fine in theory, except when there are multiple non-taken branches in the same instruction block.

I expect that the answer to all these points will be pretty straightforward: some simple 'constant' status bit per register inside the core that gets set to 1 in case of an AUIPC/LUI/load-immediate, and that gets cleared for any operations, branch, or jump.

But it needs to be specified.

Tom

gajinderpanesar commented 2 years ago

Resolved

Answer 1 · 2019-09-09T08:47:54.000Z

Tom, See inline… Iain From: Tom Verbeure <notifications@github.com> Sent: 06 September 2019 18:24 To: riscv/riscv-trace-spec <riscv-trace-spec@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [riscv/riscv-trace-spec] Inferable vs non-inferable jumps needs to a lot more detail (#28) In earlier chapters, the spec is narrow about uninferable jumps in the sense that it doesn't mention constant registers. In the ingressPort chapter, it says that indirect jumps with constant-loaded registers are also an inferable jump. That makes sense. [Iain] Yes, I thought so too when I wrote it. But since it's the core, not the encoder, that determines which jump in inferable and which one is not, this needs to be very precisely specified. Because the decoder will need to be ready to understand constant registers load exactly the same way the core does. Right now, this is not the case. [Iain] Agreed. The spec needs details about what is considered a constant register load. · Is it only for cases LUI and AUIPC instructions? Load immediate instructions should be considered as well. I assume that operations on constant loaded registers (shift-immediate, add/subtract-constant) are excluded, even though they could qualify... · Are there restrictions on how far away the LUI and AUIPC instructions are located from the jump instruction? · What kind of barriers will invalidate constant loaded registers? Are inferably jumps out? Branch instructions should be fine in theory, except when there are multiple non-taken branches in the same instruction block. [Iain] I think it should only be AUIPC, LUI and C.LUI. This is the mechanism built into the ISA for forming long distance jumps. I don’t believe it is worth the additional complexity in terms of design and verification or the CPU, the encoder and the decoder to support other types of constant load, as I don’t believe they will occur often enough to make a meaningful difference to the encoder efficiency. Furthermore, the interface between CPU and encoder will need to be considerably more complex to support the general case of any constant-load. It will require all the state about which register contains a constant, and which register is used for the jump target to be visible to the encoder. I expect that the answer to all these points will be pretty straightforward: some simple 'constant' status bit per register inside the core that gets set to 1 in case of an AUIPC/LUI/load-immediate, and that gets cleared for any operations, branch, or jump. But it needs to be specified. Tom — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#28>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ALQOPSTVBQVCRQDDKX5ZKL3QIKG25ANCNFSM4IULI37Q>. The information in this email is confidential and is intended solely for the recipient. Access, copying or re-use of information in it by anyone else is unauthorized. If you are not the intended recipient, please inform the sender by reply or contact UltraSoC on +44 (0)1223 422133, and then delete the email.

Answer 2 · 2019-09-09T16:20:49.000Z

[Iain] I think it should only be AUIPC, LUI and C.LUI. This is the mechanism built into the ISA for forming long distance jumps. I don’t believe it is worth the additional complexity in terms of design and verification or the CPU, the encoder and the decoder to support other types of constant load, as I don’t believe they will occur often enough to make a meaningful difference to the encoder efficiency.

Sounds good!

What about the location of AUIPC/LUI/C.LUI?

Shall a jump only be considered inferable when these 3 instructions are located right before the jump (as, I think, is currently assumed in the reference C decoder) ?

I'm fine with that restriction as well, since it's pretty much a standard instruction combo in the ISA spec.

Furthermore, the interface between CPU and encoder will need to be considerably more complex to support the general case of any constant-load. It will require all the state about which register contains a constant, and which register is used for the jump target to be visible to the encoder.

I don't think the interface would need to be more complex, since it's the core that determines which register is still constant and which one is not, and the encoder simply takes that at face value? But the decoder would need to have a similar complexity.

Either way, the point is moot.

Tom

Answer 3 · 2019-09-09T16:30:35.000Z

Tom, See inline… Iain From: Tom Verbeure <notifications@github.com> Sent: 09 September 2019 17:21 To: riscv/riscv-trace-spec <riscv-trace-spec@noreply.github.com> Cc: Iain Robertson <iain.robertson@ultrasoc.com>; Comment <comment@noreply.github.com> Subject: Re: [riscv/riscv-trace-spec] Inferable vs non-inferable jumps needs to a lot more detail (#28) [Iain] I think it should only be AUIPC, LUI and C.LUI. This is the mechanism built into the ISA for forming long distance jumps. I don’t believe it is worth the additional complexity in terms of design and verification or the CPU, the encoder and the decoder to support other types of constant load, as I don’t believe they will occur often enough to make a meaningful difference to the encoder efficiency. Sounds good! What about the location of AUIPC/LUI/C.LUI? Shall a jump only be considered inferable when these 3 instructions are located right before the jump (as, I think, is currently assumed in the reference C decoder) ? [Iain] No, I think it’s quite likely that compilers will pull the AUIPC earlier so that the register contents are available for use by the jump immediately. If the AUIPC is right before the jump there will be a stall as the jump is dependent on the AUIPC completing, and the pipelined nature of virtually any implementation will mean this will take several cycles. (You’re correct that this is what the reference decoder currently does, because that is what UltraSoC’s encoder HW currently does too. But it needs revising.) I'm fine with that restriction as well, since it's pretty much a standard instruction combo in the ISA spec. Furthermore, the interface between CPU and encoder will need to be considerably more complex to support the general case of any constant-load. It will require all the state about which register contains a constant, and which register is used for the jump target to be visible to the encoder. I don't think the interface would need to be more complex, since it's the core that determines which register is still constant and which one is not, and the encoder simply takes that at face value? But the decoder would need to have a similar complexity. [Iain] It most definitely would be more complex - see the “Jump Classification Issue” thread. I’ll happily explain further if that doesn’t answer your questions. Either way, the point is moot. Tom — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#28>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ALQOPSSD2HXNYO7LU4BOMK3QIZZWHANCNFSM4IULI37Q>. The information in this email is confidential and is intended solely for the recipient. Access, copying or re-use of information in it by anyone else is unauthorized. If you are not the intended recipient, please inform the sender by reply or contact UltraSoC on +44 (0)1223 422133, and then delete the email.

Answer 4 · 2019-09-09T16:57:31.000Z

[Iain] No, I think it’s quite likely that compilers will pull the AUIPC earlier so that the register contents are available for use by the jump immediately. If the AUIPC is right before the jump there will be a stall as the jump is dependent on the AUIPC completing, and the pipelined nature of virtually any implementation will mean this will take several cycles.

I don't think this is something to worry about because the AUIPC/JALR combo was listed as a macro-op fusion candidate in the 2016 macro-op fusion presentation and because the RISC-V ISA spec hints to that as well (see footnote below table 2.1 on page 17 of the 2.2 version.)

But it's obviously not a big deal to support cases where they are not back-to-back.

Tom