calyxir/calyx

[Cider] Control node naming scheme proposal

Opened this issue · 7 comments

Per discussions with the broader team we are looking for a way to assign a consistent name to arbitrary nodes in a component's control program. These don't need to be 100% human-readable, but they should generally be intelligible. Ideally, we would also like something that has a close correspondence with the path from the root of the control program to the node of interest.

While we could imagine defining a standardized numbering schema where we simply begin counting up from the root, this is challenging to interpret without the program and somewhat error-prone. There are also some instances which are complicated by if statements since an if can be written without an explicit false branch that is nevertheless present. I do think the idea would be useful for a more interactive specification when using Cider in debug mode, though.

Instead, I propose something a little closer to a spelled out path traversal. Syntax is provisional for the moment. Consider the following program:

control {
    seq {
       grp_a; 
       grp_b;
       grp_c;
    }
}

We can identify the root as main::seq and it's first child as main::seq.0_enable and so on for the subsequent children. par blocks treat their children in much the same way.

Consider:

  control {
    seq {
      let0;
      while le0.out with cond0 {
        seq {
          upd0;
          invoke exp0(base = a_read0_0.out, exp = const3.out)();
          let1;
          upd1;
          upd2;
  }}}

Suppose we wish to refer to the invoke statement. Following the scheme we would have something like:

main::seq.1_while.seq.1_invoke

For if statements things are a bit less obvious since we need to be able to refer to one of two bodies.

 control {
    if lt.out with cond {
      true;
    } else {
      false;
    }
  }

One choice would be to use numbers in much the same way as seq and par with 0 being the true branch and 1 being the false branch. Alternatively we could use t and f. Addressing the false branch in the above program would then be:

main::if.1_enable

or

main::if.f_enable

Syntax suggestions welcome as I'm not entirely satisfied with underscore being the index separator but think it is harder to understand if things are run together sans separation.

This looks like a great start! Could you add examples for multi-component programs? Also, what ensures that there never is a name clash?

Thanks for starting this! I should mention: the current @ControlID (#1366) and @NodeID stuff might also benefit from this. @calebmkim might have thoughts since this is used in resource sharing and (maybe?) the new FSM stuff.

This looks like a great start! Could you add examples for multi-component programs? Also, what ensures that there never is a name clash?

So for multicomponent programs the structure would essentially be the same just changing the leading component definition name. That of course refers to the control node in the definiton, rather than a particular instance. I suppose if we were interested in pointing to a particular control node in a particular instance you could further extend the beginning to be the fully qualified name of the cell rather than the component name, like so:

<FULL CELL NAME>::<PATH>

Could you say more about which name clashes you're concerned about?

Could you say more about which name clashes you're concerned about?

I guess the argument here is that there are no user-defined names and that statements of the same type get numbered, so there is no chance for a collision, correct?

Yeah, that's more or less my though process. The only user defined names should be the component names themselves or the fully qualified name of the cell, neither of which should be able to clash. Children of if, par, and seq are numbered so that you can distinguish between, say, multiple enables at the same depth in the tree without explicitly naming each enabled group or even multiple enables of the same group. while loops have exactly one child, so no numbering is needed there. enable, invoke, and empty are all terminal nodes, so as long as the path to them is correct there should be no ambiguity.

@eliascxstro

The modified proposal, after a discussion with @sampsyo, looks like the following.

<COMP_NAME>: PATH_DESCRIPTOR

Where . refers to the root node of the control program. .0 refers to the zeroth child of the root component. .0-0 refers to the zeroth child of the zeroth child of the root. For if blocks use t for the true child and f for the false. For while use b for the body.

Annotating my example programs:

control {
    seq {               \\ .
       grp_a;           \\ .0
       grp_b;           \\ .1
       grp_c;           \\ .2
    }
}
  control {
    seq {                                                                        \\ .
      let0;                                                                      \\ .0
      while le0.out with cond0 {                                                 \\ .1
        seq {                                                                    \\ .1-b
          upd0;                                                                  \\ .1-b-0
          invoke exp0(base = a_read0_0.out, exp = const3.out)();                 \\ .1-b-1
          let1;                                                                  \\ .1-b-2
          upd1;                                                                  \\ .1-b-3
          upd2;                                                                  \\ .1-b-4
  }}}
 control {
    if lt.out with cond {       \\ .
      true;                     \\ .t
    } else {
      false;                    \\.f
    }
  }

At the risk of bikeshedding on syntax, we don't need to use - as the separator character or . as the root designator (could use r) and we could mandate a separator character between the root designator and subsequent path node. In the future we could also allow optional annotations (like my original proposal) which indicate what the type of the node is.