Replay queue depth insufficient for RoCC accelerators

Type of issue: other enhancement

Impact: no functional change

Development Phase: proposal

Other information

When a RoCC accelerator sends memory requests back-to-back, the two entries in the replay queue are not sufficient to handle a request every cycle.

rocket-chip/src/main/scala/rocket/SimpleHellaCacheIF.scala

Line 103 in dbcb06a

val replayq = Module(new SimpleHellaCacheIFReplayQueue(2))

This can be simply fixed by changing the depth of the replayq to 3.

If the current behavior is a bug, please provide the steps to reproduce the problem:

The problem and proposed fix can be explored by adding an accelerator which simply loads the same address multiple times and back-to-back. After the initial miss, the L1D should be able to service the loads every cycle since the data is in the L1D. In the current state, the replayq will cause back-pressure in one in three clock cycles. After applying the suggested fix, it can handle a memory request every cycle.

What is the current behavior?

At the moment, the insufficient replayq depth will cause back pressure to the accelerator in one in three cycles.

What is the expected behavior?

Handling memory requests every cycle without back-pressure due to the insufficient replayq depth.

Please tell us about your environment:

What is the use case for changing the behavior?

RoCC accelerators accessing the L1D might gain up to 50% performance when sending memory requests back-to-back.

Good observation. I'd be happy to approve a PR with the fix implemented (Please PR to the dev branch)