Vectors and name collisions

Question

Vectors and name collisions

mbty opened this issue 4 years ago · 2 comments

In Kôika projects including vectors, obtaining a name collision is trivial - simply add a register named <vector_name>_0. This is due to the way names are built through concatenation in coq/DerivingShow.v: the names generated are themselves valid Coq identifiers. For a demonstration, see commit 6e2010a on my fork. In this example, a register named rData_0 has been added to examples/vector.v and g++ doesn't appreciate that.

There are several possible workarounds, and the simplest I can think of is to use the $ character in the concatenation phase instead of _. The reason why this works is that the C++, Verilog and DOT backends all accept this character in register names whereas Coq doesn't allow it in identifiers. This follows from the definition of an identifier in the standard for C++ and Verilog, and from the fact that register names are printed between quotes for .dot files. See commit c11c463 for a demonstration of this possible fix (examples/vector.v has not changed since the previous commit yet it compiles now).

If support for $ in identifiers is to be added in some future Coq release, then this character should be considered reserved by Kôika. If another backend without support for $ in identifiers is to be added to Kôika in a future release, then there wouldn't be much of a problem. What matters is that Kôika outputs unique names, and these can always be adapted from cuttlec to suit the specifics of given backends.

Some other related but less problematic issues exist. For instance, there is no upper bound to the length of Coq identifiers yet the same can not be said of Verilog, where the lowest maximum length tolerated by the standard is 1024. On a similar note, Coq is in general much more tolerant than Verilog when it comes to the characters allowed in identifiers: using accents (let alone non-breaking spaces) in a Verilog identifier is not accepted. Writing Kôika code that typechecks in Coq yet results in an error further down the line is therefore possible and even easy. As Kôika outputs unique names, this kind of problem can always be managed from cuttlec. In practice, this problem is easy to deal with and in fact the solution is always some register renamings away.

The context of this question is that I am looking for a way to generate Verilog through Kôika in such a way as to get BRAM blocks to be used through inference when using vectors past a certain size. Assuming that registers stored in BRAM don't behave differently from registers stored using flip-flops, this shouldn't violate any of Kôika's expectations. It could for instance be interesting to model the memory of the RISC-V example directly in Kôika instead of relying on the current workarounds.

However, it seems to be possible to define a vector of registers of varying sizes through Vect as follows:

Definition R r :=
  match r with
  | rData n => bits_t (index_to_nat n)
  end.

This does not correspond to a vector in the traditional sense and it would in particular not be suitable for BRAM storage. Is this behavior desired or just a natural consequence of the way registers are managed? Why is reg_t defined directly through Inductive rather than through a custom construct?

There is a link between the previous question and the "related but less problematic issues" mentioned above: a way of solving them would be to limit the accepted names for register names by avoiding direct reliance on Inductive for their definition (and maybe likewise of Definition for R and such). Furthermore, not treating registers and vectors uniformly may reveal practical for my purpose. In fact, limiting the definition of reg_t to raw registers and vectors (in the traditional sense of the word) covers all the use cases of Kôika I can think of (I mean, registers alone would suffice but vectors make life way easier) - am I missing situations where a full blown inductive type is useful?

I think this digression matters as quite a lot of systems that would be interesting to model using Kôika have some notion of memory, which is best represented using vectors, which are in turn best stored in BRAM when targeting FPGAs.

Answer 1 · 2021-03-09T00:28:08.000Z

Thanks a lot for the report. I will expand on the following when I have time (probably next week, if that's OK).

On DerivingShow: It could/should be made smarter to avoid collisions; in the meantime it's the user's job to provide custom names or avoid collisions in naming. Using $ would lead to less readable names in C++, so I'm not too sure about that.
On vectors: The way reg_t works makes the semantics a lot simpler (no need to reason about finite sets of strings). Kôika lets you define family of vectors, and in your example they indeed don't all have the same size.
On verilog names: we should do proper encoding of names like it's done in C++ (where we need tricks to avoid __, for example).

Answer 2 · 2021-04-02T13:41:07.000Z

As promised, some details:

In practice, this problem is easy to deal with and in fact the solution is always some register renamings away.

I think that's key. I'd rather not invest too much in generating complicated names, since renaming makes the generated nicer too. We could try to issue a warning in case of name collisions, though. And we should definitely check for too-long verilog names.

The context of this question is that I am looking for a way to generate Verilog through Kôika in such a way as to get BRAM blocks to be used through inference when using vectors past a certain size.

That's a nice goal, though I think it's a bit hard with the current Kôika set-up. The issue is basically that in Kôika right now everything needs to be reversible, and writing to a BRAM isn't really, so you need to make sure that you write to the corresponding array only from a rule that cannot fail. Without doing guard lifting or similar tricks, you find yourself playing the game that the memory bus rule plays in the RV core. The general solution would be to have a clean notion of modules with a request/response interface, and @threonorm is working on that.

This does not correspond to a vector in the traditional sense and it would in particular not be suitable for BRAM storage. Is this behavior desired or just a natural consequence of the way registers are managed?

It's desired; for a bram-style device, you'd probably want to use array_t instead — but for reasons of trust we actually compile arrays away when lowering to Verilog (it wasn't always like that; we used to produce arrays in the generated Verilog, but we removed them after a while). Reintroducing types at the RTL level could be a good way to start tackling this problem; it would also make it possible to introduce a bram type. But it would be quite a bit of work.

am I missing situations where a full blown inductive type is useful?

You get very nice compositionality properties from using an inductive; in particular, you can nicely isolate the type of each submodules without having to deal with strings (that is precisely the kind of aliasing problems that pop up when using show), and you can trivially define environments based on FiniteType instances.