GHR refresh
Zissi-Lei opened this issue · 1 comments
Hi,
in the file "ifu_bp_ctl.sv", there is a GHR shift logic at line 1032:
assign merged_ghr[RV_BHT_GHR_RANGE] = ( ({
RV_BHT_GHR_SIZE{num_valids[3:0] >= 4'h4}} & {RV_BHT_GHR_PAD, final_h }) | // 000H ({
RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h3}} & {RV_BHT_GHR_PAD2, final_h}) | // P00H
ifdef RV_BHT_GHR_SIZE_2
({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & { 1'b0, final_h}) | // PP0H
else
({RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & {fghr[
RV_BHT_GHR_SIZE-3:0], 1'b0, final_h}) | // PP0H
endif ({
RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h1}} & {fghr[RV_BHT_GHR_SIZE-2:0], final_h}) | // PPPH ({
RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h0}} & {fghr[`RV_BHT_GHR_RANGE]}) ); // PPPP
I see that when num_valids[3:0] ≤ 4'h2, you just shift the GHR left without retaining the MSBs. But when num_valids[3:0] ≥ 4'h3, you choose to retain the MSBs of GHR, rather than just to shift it left like before. Is there another considerations about this policy? I'm very confused about this logic, thanks for your time!
From the designer:
Part 1: “I see that when num_valids[3:0] ≤ 4'h2, you just shift the GHR left without retaining the MSBs”
This is only true for small BHTs that don’t have more bits in the GHR. The code shows this in the conditional:
assign merged_ghr[`RV_BHT_GHR_RANGE] = ( ({`RV_BHT_GHR_SIZE{num_valids[3:0] >= 4'h4}} & {`RV_BHT_GHR_PAD, final_h }) | // 000H
({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h3}} & {`RV_BHT_GHR_PAD2, final_h}) | // P00H
`ifdef RV_BHT_GHR_SIZE_2
({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & { 1'b0, final_h}) | // PP0H
`else
({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h2}} & {fghr[`RV_BHT_GHR_SIZE-3:0], 1'b0, final_h}) | // PP0H
`endif
({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h1}} & {fghr[`RV_BHT_GHR_SIZE-2:0], final_h}) | // PPPH
({`RV_BHT_GHR_SIZE{num_valids[3:0] == 4'h0}} & {fghr[`RV_BHT_GHR_RANGE]}) ); // PPPP
Also, for num_valids < 2, we clearly have the fghr upper bits.
Part 2: “But when num_valids[3:0] ≥ 4'h3, you choose to retain the MSBs of GHR, rather than just to shift it left like before”
This is better for our benchmarks and comes down to the accuracy of the predictor when there are many valid branches in the fetch group. If you would prefer to do a full shift, you can modify the RV_BHT_GHR_PAD(2) defines.
(The likelihood of predicting 3 or more branches correctly is low (.85^3), so we preserve the upper bits. In practice it doesn’t really matter since we copy the EXU true GHR when we mispredict.)
Hope this helps.