cornell-zhang/allo

[Feature] Rewind memory access loops

redbudgithubsec opened this issue · 1 comments

Issue
Currently the loops for accessing top level variables that are automatically generated with pipelining at II=1 which is great. However, in my testing this can still lead to 10x the theoretical runtime for 2d arrays.

Solution
Adding rewind to the end of the automatically generated pipeline pragmatism fully solves this performance issue while sometimes also reducing hardware usage.

Example - My matrix vector multiply program.
Without rewind (current setup):
image
78 cycle interval for buf1

With rewind manually added:
image
4 cycle interval achieved.

I'm sorry this is Zack, I'm just on the wrong account.