WebAssembly/nontrapping-float-to-int-conversions

Bikeshed: naming conventions

rossberg opened this issue · 11 comments

Apropos of this thread, I noticed that the current naming conventions for the new conversion operators are somewhat off with the conventions used elsewhere in Wasm: in all other cases, the _u/_s prefix always goes last, or next to the source type for conversion operators.

To make it more consistent, I would propose to name the new instructions

  • i32.trunc_sat_s/f32
  • i32.trunc_sat_u/f32
  • ...

Arguably, this is also more fitting because it views "trunc_sat" as the core conversion operation that is annotated with the various type and signedness variations it comes with.

The current naming scheme intends to convey that "trunc" is the core conversion operation, which is a function that happens to be undefined (in the math sense) for some possible inputs. The ":sat" naming scheme is meant to convey:

  • the default for instructions in WebAssembly is to trap when the mathematical result can't be expressed in the result type
  • some instructions may allow this behavior to be overridden with modifiers like ":sat".

This convention would generalize to things like div_s or rem_u or others which are also functions that are similarly undefined for some possible inputs, where it could theoretically be desirable some day to define non-trapping forms. As another example, a hypothetical add_s would trap on overflow, and add_s:sat would saturate on overflow.

Yes, I see that, and agree. But why does that suggest trunc_s:sat instead of trunc:sat_s (same for other potential operators)?

Here is one way to describe the evaluation of i32.trunc_s:sat/f32:

  • the operand has type f32
  • the trunc of the operand value is computed; the result is one of these:
    • an integer value
    • "invalid" (on NaN)
  • this is a _s operation, so an integer value is converted to one of:
    • a signed i32 value
    • "positive overflow"
    • "negative overflow"
  • a :sat is present, so:
    • "invalid" is converted to 0
    • "positive overflow" is converted to the maximum signed i32
    • "negative overflow" is converted to the minimum signed i32
  • if the result is still not a signed i32, trap

One could reasonably combine the _s and :sat parts. But, keeping them separate means the _s part, and the other parts, can all be shared with the non-:sat form, which is nice. So this suggests a sequence of trunc, _s, and then :sat.

[Sorry for the delay, still working through my discussion backlog]

I see what you mean from an implementation perspective. But would you argue that that is the only relevant consideration?

I would think that a user probably cares most about applicability and types. Imagine they have a signed 32 bit int. The advantage of the current Wasm naming scheme is that they can easily identify all applicable operators because they are uniformly named either i32.xxx_s, or i32.xxx without signedness. Similarly, to find all ways to convert from signed i32 to some T they simply find all operators named T.xxx_s/i32. That seems like a rather valuable organisation principle.

No, I don't consider it an implementation perspective. For reasoning about a program, the case where a computed result approximates the real result of an algorithm is different from the case where it's unrelated to the algorithm.

I'm confused, of course they are different, but what does that imply about naming conventions, other than that they shouldn't have the same name?

Mnemonics should be abbreviated descriptions of what their instructions do, as clearly as possible while still being short. Behavior on overflow is a user-relevant part of what they do. I described above how the proposed mnemonics aim to address that.

I am also unclear on the value of optimizing for the length of the regex needed to discover certain kinds of opcodes. Good documentation will group the conversion opcodes in a way that makes them easy to discover.

From a purely practical perspective, it would be great if we could keep colons out of mnemonics. The LLVM assembly lexer treats colons as separate tokens, so we would have to hack around it to correctly assemble instructions containing colons. FWIW, the same is true of slashes, but that ship has probably sailed. This hasn't come up before because we have not previously had comprehensive tests of the assembler. @aardappel is planning to look at what would be required to support instructions containing these characters next week.

@tlively, agreed. Using up a whole new special character convention for this purpose seems subideal.

FWIW, you can blame the slash on me. In retrospect, I wouldn't mind eliminating it, e.g., in favour of dot perhaps, or underscore? But it might be too late.

Using . would have been great, since we already use it.

@rossberg @aardappel I'll add an agenda item to the next CG meeting so we can settle the question of whether it's too late sooner rather than later.