riscv-non-isa/riscv-asm-manual

Proposal for a common convention for synonyms for register ABI names in hand-coded assembler

scotws opened this issue · 8 comments

(Moved from riscv/riscv-isa-manual#825 at suggestion of @nick-knight; added section on alternative solutions)

The sheer number of RISC-V registers can make it hard to keep track of what has been assigned where when coding assembler by hand. Some assemblers allow the creation of synonyms -- "renaming" -- for the ABI names of the registers via .equ, .eqv or a similar statement, which can help. Such synonyms can also reduce the chance that a single typo -- say, from t0 to t1 -- remains unnoticed by coder and assembler.

However, there is currently no standard or recommended best practice for this that I am aware of. Providing such a recommendation or standard could cut down on the number of variants in the wild, aiding readability and supporting debugging of hand-written code.

This issue proposes adding a simple, short recommendation for a "common convention" or "best practices" for creating synonyms for ABI register names when hand-coding assembler, for inclusion in The RISC-V Instruction Set Manual Volume I: Unprivileged ISA, presumably in Chapter 25. The assumption is that this is going to happen anyway, so an attempt should be made to nudge coders towards a common format.

Criteria

To keep it simple, there are three criteria for this scheme:

  1. Synonyms should only be created if many registers are in use at the same time and only for registers that are used for a longer time. In no way should "renaming" become the default practice.
  2. The original ABI register name should be conserved to aid debugging. The idea is to add information, accepting the inherent redundancy for the sake of human readers and error detection.
  3. In addition, an indicator of the type of data stored in the register or its intended use may be included for the same reason.

Syntax

The original register ABI name is kept, but followed by an underscore; then an optional single letter for the type or use, followed by a second underscore; finally the user-chosen name.

Examples:

    .eqv t0_n_cats t0      # current number of cats
    .eqv s1_p_catname s1   # pointer to the beginning of a string
    .eqv s2_f_havetuna s2  # flag to bool if goal achieved 

As an initial suggestion for the type indicators:

  • c counter
  • f flag
  • l loop counter
  • n number of
  • p pointer

Important: The original ABI name of the register remains untouched and fully functional. Any new definition creates a synonym, not a replacement.

Example: The Battle of the Four Armies

Consider as a toy example a battle between hobbits, orcs, elves, and dragons, whose numbers are stored in s0, s1, s2, and s3 respectively. A code snippet to decide the winner could look like this:

        bgtz s3, dragons_win
        bgt s1, s2, orcs_win
        bgt s2, s1, elves_win
        j hobbits_win

Since these registers are used heavily in the rest of the code as well, we create synonyms following the scheme proposed:

.eqv s0_n_hobbits s0  # number of hobbitses
.eqv s1_n_orcs s1     # number of orcs
.eqv s2_n_elves s2    # number of elves
.eqv s3_n_dragons s3  # number of dragons

Our code snippet then becomes:

        bgtz s3_n_dragons, dragons_win
        bgt s1_n_orcs, s2_n_elves, orcs_win
        bgt s2_n_elves, s1_n_orcs, elves_win
        j hobbits_win

The additional information makes the code more readable and the logic easier to follow while preserving all original information. Also, a typo such as t2_n_elves will now be caught by the assembler because this name is not defined.

Drawbacks

  • Creating synonyms for register names or renaming registers is uncommon, since most processors simply do not have enough of them for these problems to occur. Any such change could cause confusion when readers first are confronted with code that contains renamed registers. Some people will object on principle.
  • There is currently no assembler-level support for this; some assemblers might even refuse creating register synonyms; possibly this might break some compiler/assembler level checks. This is a chicken-or-egg problem, as a convention for register synonyms would provide a basis for assembler writers to implement this function.
  • Register renaming might start to be considered a must or best practice in itself for hand-coding assembler, not a tool to be used sparingly in situations where a large number of registers in use becomes unwieldy.
  • There is currently no way to take back or delete synonyms once stated.

Alternatives

It seems that some people are currently using #define statements to implement this functionality. However, RISC-V should not be dependent on a single high-level language for this. Given the (slow) rise of Rust as a C alternative, this current solution seems especially problematic.

Open questions

  • A suggestion for the scope of using these modified names should be included, possibly on a per-subroutine or segment basis.

Thanks for moving your proposal to this repo.

I'd like to echo the comments Paul (riscv/riscv-isa-manual#825 (comment)) and Nick (riscv/riscv-isa-manual#825 (comment)) made.

FWIW, I'm in the camp that the C preprocessor already satisfies this need--elegantly, at that.

The #define solution leaves anybody without a C-based system out in the cold -- see https://github.com/TheThirdOne/rars for instance. Shouldn't RISC-V be independent of C, at least in the long term?

I'm sympathetic to that concern, but in no way is reliance on the C preprocessor reliance on the C language. No one's being left out in the cold (unless they're rejecting use of the C preprocessor for anti-pragmatic reasons).

Okay. How about I create a pull request that states the problem and points to C preprocessor as a software-supported solution in a general way, mentions the above naming scheme as a poor substitute if that is not possible, and then somebody who has more experience with (and appreciation for) the C preprocessor adds some examples and hints for best practices?

Makes sense to me.

@nick-knight Sorry, you had mentioned in another comment that you use the preprocessor method extensively in your code. Might you have any recommendations or even some sort of a style guide based on your experience, or a recommendation of what not to do? Thank you!

I assume most developers that write assembly code use .S files (assembly code that is preprocessed by a C preprocessor). For example glibc and Linux use this approach. But some projects also have their own solutions (OpenSSL uses Perl to generate assembly code). I'm convinced that a RISC-V assembler (as defined by this repository) should not aim to provide equivalent solutions. Instead, I prefer standardizing only a minimal set of RISC-V-specific assembly directives (those which are necessary). Otherwise, we introduce a long list of required features for a RISC-V assembler (include directives, binary includes, conditional compilation, etc.) that nobody wants to implement or use, and that are already available by existing mechanisms.

Also note, that a project like rars could use a stand-alone C pre-processor to be compatible with code form .S files (I recently stumbled over a project that depends on mcpp for a similar purpose).

That all does not mean, that feature-rich assemblers can come up with their own solutions. E.g. GAS supports include directives and conditional compilation. In fact GAS' ARM/AArch64 backends also support register aliases using the .req directive:

.globl return_zero
retval .req w0
return_zero:
        mov     retval, 0
        ret

One last hint: Both, .equ and .eqv, already have defined semantics in GAS. So I would expect objections when trying to implement a support patch for GAS that uses these directives.

Obviously this is a solution that nobody seems to see a problem for. I'm going to close this issue, and if it comes up again in a few years we'll know where to find it. Thank you everybody for your feedback!