Documentation formatting
bradleyharden opened this issue · 6 comments
Introduction
Before we can implement a VHDL domain for Sphinx, we will need to make a few decisions about how to display and link descriptions of objects. I expect that the most important objects to document will be functions, procedures and entities. Functions and procedures have an advantage, since their syntax is effectively identical to traditional function calls in a programming language. Entities, on the other hand, are very different from function calls.
In Sphinx, an object description starts with a signature node that describes the objects call signature. The object body contains fields or field groups. For example, a field might be the function return type, while a field group would be a list of the function parameters. This structure is pretty baked in, and if we stray too far from this structure, we will end up having to re-implement many of the pieces ourselves.
Below is my proposed design. It is meant purely as a starting point to discuss the merits of different ideas. I still have many open questions, and I'm not at all sure of the best way to do this. Broadly, I think there is a tension between two different approaches. We could make the reStructuredText easy to write, and put a lot of work into parsing within Sphinx. Or we could make the reStructuredText more complex to ease the burden on Sphinx.
The example below represents an approach that is closer to the former, but I think it gives the nicest looking end result.
Proposed reStructuredText
.. vhdl:entity::
entity lib.entity_name is
generic(
type type_t;
const1 : type_t;
const2 : pkg.other_type_t;
const3 : integer := 0;
function func(param : type_t) return std_ulogic);
port(
reset : in std_ulogic;
clock : in std_ulogic;
input : in std_ulogic := '1';
output : out pkg.record_t);
:generic type_t: Description of type
:generic const1: Description of constant
:generic const2: Description of constant
:generic const3: Description of constant
:generic func: Description of subprogram. I would have to manually
describe the function parameters here.
:port reset: Description of port
:port clock: Description of port
:port input: Description of port
:port output: Description of port
Corresponding proposed layout
entity lib.entity_name is
generic (
type type_t;
const1 : type_t;
const2 : pkg.other_type_t
;
const3 : integer
:= 0;
function func(param : type_t) return std_ulogic
));
port (
reset : in std_ulogic
;
clock : in std_ulogic
;
input : in std_ulogic
:= '1';
output : out pkg.record_t
);
Description of entity
Generics: • type_t — Description of type
• const1 — Description of constant
• const2 — Description of constant
• const3 — Description of constant
• func — Description of subprogram. I would have to manually
describe the function parameters here.
Ports: • reset — Description of port
• clock — Description of port
• input — Description of port
• output — Description of port
Discussion
The object description begins with the word entity, which is placed as a signature annotation. I think this fits well, because entity is similar to class in Python, it annotates the object with its use in the language.
The library is prepended to the entity name as a signature addname. Like modules or class names in Python, an addname tells you the fully specified name of the object.
The entity name is added as a signature name.
The generics and ports would have to be added as two separate signature parameter lists with custom formatting. By default, signature parameter list nodes print all parameters on a single line that is only wrapped if necessary. Personally, I think this approach would make VHDL entities extremely hard to read, so I think it would be worth it to implement custom behavior here.
In Python documentation, parameter types are not usually given in the object signature. However, because generic and port types are integral to the entity signature, I thought it would be good to keep them. If we do keep them in the signature, then we should probably cross-reference them where possible. The literal boxes
in the layout above are meant to represent types that would link to their corresponding descriptions. However, I see one tricky thing here; how would Sphinx know that occurrences of the generic type type_t
in the generic list should not be turned into links? If there happened to be a type_t
defined somewhere in the documentation, then Sphinx could produce an incorrect link. I think this is a tricky parsing problem.
Parsing generic subprograms is also a challenge. We could potentially nest generic subprograms, like methods in a class, but that might get complicated quickly.
In general, the above approach leaves a lot of parsing to Sphinx.
Alternatives
One major alternative would be to limit the entity signature to just the annotation, addname and name nodes. There would be no parameter lists providing the generics and ports in the entity signature. Instead, descriptions of the types and default values would fall to the field groups.
.. vhdl:entity:: lib.entity_name
:gent type_t: Description of type
:genc const1: Description of constant
:genc_type const1: type_t
:genc const2: Description of constant
:genc_type const2: pkg.other_type_t
:genc const3: Description of constant
:genc_type const3: integer := 0
:genf func: Description of subprogram. I would have to manually
describe the function parameters here.
:genf_type func: (param : type_t) returns std_ulogic
:port reset: Description of port
:port_type reset: in std_ulogic
:port clock: Description of port
:port_type clock: in std_ulogic
:port input: Description of port
:port_type input: in std_ulogic
:port output: Description of port
:port_type ouput: out pkg.record_t
There's still some parsing to be done by Sphinx, but the burden is reduced. For example, Sphinx no longer needs to tell the difference between generic types, constants, functions and procedures. The field name tells it what the generic is. However, things are still a little bit awkward, because the "type" for each field has more than just the type. For instance, the "type" for a generic constant could contain both a type and a default value. The "type" for a port contains the port direction and the type.
Right now, Sphinx is able to automatically parse a TypedField. For example, for Python parameters, it can take
:param foo: Description of foo
:type foo: str
and deliver a pair of dictionaries with the descriptions and types both accessed by key foo
. However, for VHDL generics and ports, there is more than just one extra piece of information that needs to go along with each parameter. We can either lump both the port type and port direction into the "port_type" field, or we can write our own field parser that will allow us to split them out into three fields, i.e.
:port foo: Description of foo
:port_type foo: std_ulogic
:port_dir foo: in
This is easier to write, but it adds extra work on the Sphinx side of things.
Finally, using this strategy, how would you display the documentation. Something like this?
entity lib.entity_name
Description of entity
Generic • type_t — Description of type
types:
Generic • const1 : type_t — Description of constant
constants: • const2 : pkg.other_type_t
— Description of constant
• const3 : integer
:= 0 — Description of constant
Generic • func(param : type_t) return std_ulogic
subprograms: Description of subprogram. I would have to manually
describe the function parameters here.
Ports: • reset : in std_ulogic
— Description of port
• clock : in std_ulogic
— Description of port
• input : in std_ulogic
— Description of port
• output : out pkg.record_t
— Description of port
This is closer to the way Sphinx displays Python code, but the entity lacks any signature. You can't see all of the parameters at a glance. Rather, you have to parse the corresponding lists. Although this option may be preferable, since the signature in the first example wasn't the most readable anyway.
Conclusions
I think we need to settle on answers to these questions before we can move forward. Moreover, I think we need several educated opinions on this issue who know the Sphinx code base and have some idea of how we might implement the solution we choose. I think I qualify, now that I've read quite a bit of the Sphinx code for Python. But I would really like to see some other thoughts on the issue.
In reStructuredText, you can nest directives. I was thinking of something like this:
.. vhdl:currentlibrary:: PoC
.. vhdl:entity:: arith_prng
description text
.. vhdl:generics::
.. vhdl:constant:: BITS
:type: positive
:default: 8
Define number of output bits per cycle.
.. vhdl:constant:: INIT
:type: bit_vector
:default: "10010110"
Initial PRNG value after start or reset (seed value).
.. vhdl:ports::
.. vhdl:signal:: Clock
:type: std_logic
:mode: in
.. vhdl:signal:: Reset
:type: std_logic
:mode: in
Synchronous, high-active reset.
In VHDL, many constructs are repeated and reused in the language. A port is a signal and an ordinary generic is a constant. Adding functions, packages or types to generics will be simple (if my idea works :) ), because we reuse existing directives.
Entities, packages, functions, procedures (and since 2019 protected types) can have generics. If we get nested directives to work, we would be very powerful, right? As far as I understood Sphinx, a directive can query if she is nested in another directive of type X. Thus we should be able to use other styling if a constant is in a package or generic list.
Yes, that's what I thought you were getting at when I looked at the README. My concern is that your approach is different from the implementation for Python and, I assume, other programming languages.
Here is an example given in the Sphinx documentation for Domains:
.. py:function:: send_message(sender, recipient, message_body, [priority=1])
Send a message to a recipient
:param str sender: The person sending the message
:param str recipient: The recipient of the message
:param str message_body: The body of the message
:param priority: The priority of the message, can be a number 1-5
:type priority: integer or None
:return: the message id
:rtype: int
:raises ValueError: if the message_body exceeds 160 characters
:raises TypeError: if the message_body is not a basestring
The .. :py:func::
directive is a subclass of ObjectDescription
. When this directive is run, it uses the directive argument to create a series of nodes describing the function signature. Next, it parses the nested content into respective nodes. In the example above, it creates a text node for the function description and a single field list node for the 9 fields. After parsing the nested content, the ObjectDescription
directive goes back and "cleans up" the field list. It transforms the field list from 9 fields to 4 (Parameters, Return value, Return type, and Raises). It parses the 9 fields and allocates them to the 4 "real" fields. Fields with multiple members (e.g. Parameters) are GroupedField
s. The body of each GroupedField
is transformed into a bullet list containing its respective elements. Finally, the ObjectDescription
directive replaces the 9-member field list with the new, 4-member field list. Later, the HTML writer transforms the 4-member field list into the documentation you see.
Effectively, each Python directive captures all information about an object in one of two places, either in the directive arguments or in a single field list within the directive body. Nodes created by nested directives are not re-processed at all. Some context is passed down to nested directives, but it is limited to things like module and class names, so that a fully specified name for each object can be built.
I am a little averse to straying from this model without any significant experience with Sphinx or any example to work from. Maybe one of the other languages does things differently?
My initial aversion is based on the following reasoning. Suppose you had a .. :vhdl:signal::
directive. If it were nested inside a .. :vhdl:package::
directive, you would want the .. :vhdl:signal::
directive to produce a stand-alone signature node and associated body node. (The net
signal in the VUnit com library is a good example of this.) Alternatively, if the .. :vhdl:signal::
directive were nested inside a .. :vhdl:port::
directive, then you would want the .. :vhdl:signal::
directive to append its data to an existing list of ports. Thus, the action taken by the .. :vhdl:signal::
directive is strongly context dependent. To me, that smells like bad design.
However, after thinking about it a bit more, I may have found a good way to implement what you describe. In the Python domain, each directive parses all parameters, exceptions, etc. as a single child node (a field list) and then it transforms that node as necessary. But in your reST structure, the ports, generics, etc. would be spread over many different child nodes. Consequently, the VHDL directives would have to iterate through all child nodes to consolidate the different pieces, and then they would have to add and delete nodes to clean everything up. I think this is a reasonable approach that could work, but it is not how the Python domain works.
If we took this approach, I think I would implement two different node "types". There would be "data structure" nodes like signal
and constant
that would never exist in the final tree. Rather, they would always be transformed by some higher level directive. The "real" directives, like port
, generic
and entity
would transform the data structure nodes and format them appropriately.
I'm not against this approach, but it does lead to verbose reStructuredText. It's probably not something you would want to write by hand. However, if we could leverage a VHDL parser to generate most of it, then it wouldn't be that bad. We could also implement something like the Napoleon plugin to make it easy to write "docstrings" in the VHDL.
Separately, I would like to note that the reStructuredText and the final documentation appearance are not necessarily closely related. In principle, we could significantly re-format the nodes after parsing the reStructuredText.
Thus, my question above still stands. What do we want the documentation for an entity to look like? I proposed two different styles. Do you like either? Would you combine them in some manner?
What do we want the documentation for an entity to look like?
I the case of sphinx-hwt it looks like your second variant: https://hwtlib.readthedocs.io/en/latest/hwtLib.amba.axi_comp.cache.html?highlight=cache#module-hwtLib.amba.axi_comp.cache.cacheWriteAllocWawOnlyWritePropagating
I think that your second variant is better because port types are often records etc. and this information is important.
About subprograms, types and other definitions/declarations in entity: I think that they should be treated as standard function/class variables in doc for python because they are not API of entity and thus the info about it needs to be somewhere in the body of doc for entity.
Coming here from TerosTechnology/colibri#244.
What do we want the documentation for an entity to look like?
I prefer the second option but I think there needs to be a signature. I think it'd be weird for everything to be on the same line. One way to accomplish this would be to make it collapsible by default:
+ entity lib.entity_name...
Description of entity
Generics:
Constants: • const1 : type_t — Description of constant
• const2 : pkg.other_type_t — Description of constant
• const3 : integer := 0 — Description of constant
Types: • type_t — Description of type
Subprograms: • func(param : type_t) return std_ulogic ...
Ports: • reset : in std_ulogic — Description of port
• clock : in std_ulogic — Description of port
• input : in std_ulogic — Description of port
• output : out pkg.record_t — Description of port
Then expanding the +
would show the multi-line signature:
- entity lib.entity_name is
generic (
type type_t;
const1 : type_t;
const2 : pkg.other_type_t;
const3 : integer := 0;
function func(param : type_t) return std_ulogic));
port (
reset : in std_ulogic;
clock : in std_ulogic;
input : in std_ulogic := '1';
output : out pkg.record_t);
Description of entity
Generics:
Constants: • const1 : type_t — Description of constant
• const2 : pkg.other_type_t — Description of constant
• const3 : integer := 0 — Description of constant
Types: • type_t — Description of type
Subprograms: • func(param : type_t) return std_ulogic ...
Ports: • reset : in std_ulogic — Description of port
• clock : in std_ulogic — Description of port
• input : in std_ulogic — Description of port
• output : out pkg.record_t — Description of port
If I was reading through the documentation for something that I might use, having a signature or instantiation template is something required IMO. It's there if you need it but it can be collapsed to keep docs concise and readable.
As for direction, type, and default value, I think they should just go in parenthesis like (direction, type, default value)
. I'm not entirely sure how this should look because type ranges/default values can get ugly fast. How would Sphinx handle function calls/type-casts in port maps? I.e. what would show up in the docs for something like some_input : in std_ulogic_vector(clog2(SOME_GENERIC)-1 downto 0) := std_ulogic_vector'(all_ones(clog2(SOME_SIZE_GENERIC)))
?
Would the docs display documentation for an architecture? If so, what happens if there's multiple architectures for an entity? I think that in most cases, people would use this stuff for internal documentation and would want the ability to show "private" information about an entity like the internal workings of the architecture (constants, signals, processes, etc.). TerosHDL for example can generate an FSM diagram, this might be useful to have in documentation, but if an architecture is ignored, could you still get at the architecture's documentation via something like :vhdl:architecture:
/:vhdl:arch:
?
If I was reading through the documentation for something that I might use, having a signature or instantiation template is something required IMO. It's there if you need it but it can be collapsed to keep docs concise and readable.
This is a good point. I might even go further and say that the signature should be expanded by default.
How would Sphinx handle function calls/type-casts in port maps? I.e. what would show up in the docs for something like
some_input : in std_ulogic_vector(clog2(SOME_GENERIC)-1 downto 0) := std_ulogic_vector'(all_ones(clog2(SOME_SIZE_GENERIC)))
?
That's a good point. I have no idea. I would hope no one would actually write something like that, because it's horribly unreadable.
I think you said you don't like tables, but that example would at least be tractable with a table.
Would the docs display documentation for an architecture? If so, what happens if there's multiple architectures for an entity?
I think this is a situation we should ignore for the moment. I expect those situations to be quite rare and not worth the added effort to handle on a first attempt.