cucapra/dahlia

Break statement

Opened this issue · 3 comments

I feel we will need "break" statement in dahlia at some point. HLS does not support dynamic loop unrolling, but if...break can get rid of this constraint. Dahlia may have similar constraints.

Can you explain what problem they solve? From what I've heard, using break is generally not recommended (especially labelled break) because the complicate the control Finite-state machine (FSM). A soft rule I've heard from HLS users is that minimizing weird control statements like break is crucial for getting good performance.

Any sorting algorithm needs break or dynamic loop. And you are right, control statements have to be kept as few as possible, but when one has to implement "weird control flow", break is a compromise to dynamic loop because in the latter case it cannot get unrolled at all.
Below is an example of knn where minimal distance needs to be chosen.
https://github.com/ptpan/ece5775/blob/04e2e004619e5edb96bd4dc58a713b6b173bc9ea/lab2/digitrec.cpp#L79
Though break unrolled is not that much different from nop if my understanding is correct.

That KNN loop is a good example!

To summarize the argument against break, from a Dahlia predictability perspective: adding a single line (break) inside an unrolled loop introduces extra hardware for every unrolled loop iteration. The loop PEs need to add exit-check logic to see if the parallel execution they're currently running is going to get "squashed" by the break signal computed by a different PE. That's a pretty large, global change just to support a seemingly-simple control flow construct that's easy for a normal CPU to implement.

Put differently, break is sort of an "anti-parallelism" thing because it introduces implicit serial dependencies between every iteration of the loop. Each iteration needs to know whether the previous iteration "canceled" the loop before it does anything that might have visible side effects.

But of course, sometimes that serialization is precisely what you want, as in the KNN example. I wonder if some sort of combine magic might be a more predictable way to express that—that is, a combine block would be allowed to break because its executions are already serialized! If we can think of a nice way to express things like a min-check reduction in a combiner, that would be awesome…