chipsalliance/Cores-VeeR-EH1

GHR refresh

Zissi-Lei opened this issue · 1 comments

Hi,
in the file "ifu_bp_ctl.sv", there is a GHR shift logic at line 1032:
assign merged_ghr[RV_BHT_GHR_RANGE] = ( ({RV_BHT_GHR_SIZE{num_valids[3:0] >= 4'h4}} & {RV_BHT_GHR_PAD, final_h }) | // 000H ({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h3}} & {RV_BHT_GHR_PAD2, final_h}) | // P00H ifdef RV_BHT_GHR_SIZE_2
({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & { 1'b0, final_h}) | // PP0H else
({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & {fghr[RV_BHT_GHR_SIZE-3:0], 1'b0, final_h}) | // PP0H
endif ({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h1}} & {fghr[RV_BHT_GHR_SIZE-2:0], final_h}) | // PPPH ({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h0}} & {fghr[`RV_BHT_GHR_RANGE]}) ); // PPPP
I see that when num_valids[3:0] ≤ 4'h2, you just shift the GHR left without retaining the MSBs. But when num_valids[3:0] ≥ 4'h3, you choose to retain the MSBs of GHR, rather than just to shift it left like before. Is there another considerations about this policy? I'm very confused about this logic, thanks for your time!

From the designer:

Part 1: “I see that when num_valids[3:0] ≤ 4'h2, you just shift the GHR left without retaining the MSBs

This is only true for small BHTs that don’t have more bits in the GHR. The code shows this in the conditional:

    assign merged_ghr[`RV_BHT_GHR_RANGE] = ( ({`RV_BHT_GHR_SIZE{num_valids[3:0] >= 4'h4}} & {`RV_BHT_GHR_PAD,  final_h }) | // 000H
                                  ({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h3}} & {`RV_BHT_GHR_PAD2, final_h}) | // P00H
`ifdef RV_BHT_GHR_SIZE_2                                
                                  ({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & {                            1'b0, final_h}) | // PP0H
`else
                                  ({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & {fghr[`RV_BHT_GHR_SIZE-3:0], 1'b0, final_h}) | // PP0H
`endif
                                  ({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h1}} & {fghr[`RV_BHT_GHR_SIZE-2:0], final_h}) | // PPPH
                                  ({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h0}} & {fghr[`RV_BHT_GHR_RANGE]}) ); // PPPP
 

Also, for num_valids < 2, we clearly have the fghr upper bits.

Part 2: “But when num_valids[3:0] ≥ 4'h3, you choose to retain the MSBs of GHR, rather than just to shift it left like before

This is better for our benchmarks and comes down to the accuracy of the predictor when there are many valid branches in the fetch group. If you would prefer to do a full shift, you can modify the RV_BHT_GHR_PAD(2) defines.

(The likelihood of predicting 3 or more branches correctly is low (.85^3), so we preserve the upper bits. In practice it doesn’t really matter since we copy the EXU true GHR when we mispredict.)

Hope this helps.