calyxir/calyx

[Cider 2.0] Groups taking an extra cycle to exit

Closed this issue · 4 comments

I mentioned this elsewhere but wanted to get it written up for general clarity. Imagine we have the standard group that increments a register which looks something like:

group incr {
  // r <- r + 1 ...
  incr[done] = r.done
}

After we run convergence, the r.in and r.write_en should both have meaningful values. After advancing the clock by a cycle, r.done will be 1 and r.out will be r+1. So we have something like this:

port value
r.done 1
r.out r + 1
incr[done] 0

This is because we need to run convergence again for the incr[done] = r.done assignment to fire. Normally this doesn't matter since we always start a program step by running convergence, but in the particular case of group terminations this creates an issue.

The pseudo-code for our simulation step looks something like

assigns = [ list of all assignments in the active groups]

while not converged:
    run assignments;
    for all cells:
        compute combinational updates

for all cells:
    advance by a clock cycle

So in our case here, at the start of the subsequent program step we still view incr as active since the done value is 0/undef meaning all the assignments from incr are considered active, meaning the group stays active for a second cycle.

If we diagram the convergence (c) and cycle updates (t for tick) which constitute a step (s) like so:

c0 t0 || c1 t1 || c2 t2
------------------------
   s0      s1       s2

The group should be done after we execute s0 the first program step. However because we haven't yet run convergence, the done value does not get propagated until c1 meaning we have already started the second step. This means that the group remains active for s1 and doesn't become inactive until s2, essentially extending the duration for a single cycle.

There are broadly speaking, two ways to address this. In Cider 1, we opted to just run convergence a second time, so program steps became

c0 t0 c0' || c1 t1 c1' ...
--------------------------
     s0         s1

while this resolves the group termination issue, it introduces a bunch of redundant work as c0' and c1 are computing the exact same thing for groups which have not just finished executing since they are consecutive calls. This re-computation would be made worse in the newer semantic model used by the prototype of v2 that first sets all ports to undef before running convergence, as nearly all the work computed in c0' will immediately be discarded when we begin c1

There is likely another set of alternatives which involve un-scheduling a group mid program step. We could selectively re-converging when a group is observed to have finished after convergence

c0 t0 || c1 c1' t1

where c1' occurs with the assignments from the relevant group disabled. This still is has a fair bit of redundant work since this would be required after every cycle in which a group became done.

Another alternative might be selectively recomputing just the cells/ports for the group in question, i.e. mark all their ports as undefined and rerun only the assignment which made incr[done] be 1 so the cells compute their combinational updates with undefined inputs. This saves probably the most work, but also could introduce errors for programs with weird undefined behaviors, such as reading from the output of an adder in a different group than the one in which it is set. If we exclude weird parallel compositions though, I believe this doesn't introduce any correctness issues, though I may be wrong.

There might also be another alternative approach which could be taken as well, so I am curious what other people think.

@sampsyo The promised issue.

Thanks for writing this up in detail!

To generalize your example slightly, let's imagine that every group[done] signal is driven from some reg.out. (This is probably close to true in practice, and it probably doesn't lose much w/r/t other cases.) Then the lifetime of a group execution in general looks like this:

cycle number | 0 | 1 | 2 | … | (n-1) | n
reg.out      | 0 | 0 | 0 | … | 0     | 1

That is, the assignment to reg happens in cycle $n-1$, and the group starts signaling that it is done in cycle $n$. This question is about what happens during cycle $n$, when reg.out starts outputting 1 and we notice that the group is done.

One super dumb question to be clear on first here is: what is supposed to happen to all the other (non-done) assignments during cycle $n$? Obviously, the assignments are supposed to be active in the cycles $0..(n-1)$, and they are supposed to be inactive in subsequent cycles. But what do they do in that cycle? I really really hope the answer is that all those assignments are supposed to be inactive during that cycle. In other words, if we still had a bottom-up compiler pass, the compiler would flatten groups by translating all assignments from x = g ? y to x = !group[done] & g ? y. (This seems like an important thing to state in the docs if we are confident that it is a true statement!)

If the above assumption holds, then I guess I want to check whether the following solution would at least suffice (performance aside): make all assignments in groups implicitly guarded on !group[done]. That is, as with any other guard, these assignments would only "fire" once group[done] is resolved for that cycle: if it resolves to 0, then the assignment proceeds as normal; if it resolves to 1, then the assignment doesn't fire. Maybe this amounts to morally the same thing as your "un-scheduling a group mid program step," but it could be handled in a uniform way, just like the "real" guard.

I really really hope the answer is that all those assignments are supposed to be inactive during that cycle. In other words, if we still had a bottom-up compiler pass, the compiler would flatten groups by translating all assignments from x = g ? y to x = !group[done] & g ? y

Effectively yes. The actual transformation looks like this:

x = group[go] & g ? y

Eventually, group[go] will get the assignment:

group[go] = fsm.out == n & !group[done]

Which means that final assignment is:

x = fsm.out == n & !group[done] & g ? y

The group[done] assignment (which really isn't an assignment at all in any meaningful sense of that word in Calyx) is special and does not get this guard. Effectively, this means that in cycle $n$, there is a convergence step but because the guards of all assignments are false, no update to the ports occurs.

Thanks. I think this confirms the most straightforward route for Cider: conceptually, "condition" every assignment on 3 things: the guard, the group's "go" signal, and the negation of the group's "done" signal. All 3 items must be true in order for the assignment to fire. (The latter two, of course, only when the assignment is inside a group!) The conditioning can hopefully work the same way for these 3 factors.