WebAssembly/module-linking

Alias shorthand syntax

Closed this issue · 19 comments

The explainer suggests the notation $i.$j as an inline shorthand for alias definitions. Unfortunately, this is problematic for at least two reasons:

  • That syntax already is a valid symbolic identifier, so would be ambiguous with regular uses of such identifiers.

  • Its desugaring depends on what sort of entity is referenced: a symbolic identifier may be bound in multiple different name spaces (e.g., $j may simultaneously name a func and global). The desugaring hence becomes dependent on context, and may potentially be ambiguous in that regard as well.

To avoid such issues, I suggest we use (type $i $j), (func $i $j), etc for such shorthands, in a natural extension of the (type $i) style forms that the text format already uses. In fact, we may want to generalise this to (type $i $j*) to conveniently deal with multiple nesting levels (which recursively desugars into a sequence of instance and type aliases).

Moreover, we should analogously support parent alias shorthands like (type parent $j+) etc.

Oh wow, I wasn't aware of the second bullet. Explicit syntax like you're suggesting makes sense then, but one thing seems a little reversed: (type $t) is used to say "I want to refer to a type I already defined (don't use the sugar for implicitly defining-or-reusing)" whereas what we're talking about here is the sugar.

It's adding a separate dimension of sugar to that existing syntax. Wherever the bare syntax has (type $t) (or (func $t), (table $t), etc) for referencing an entity (functions, exports, segments, various instructions), we allow adding additional projections. The existing sugar for "type uses" in functions or block types is independent of that and can coexist.

Another way of avoiding any ambiguity would be using export names instead of the local indices for projection. I was wondering about that anyway, because that would be cleaner and simpler in a number of ways. In particular, if we change the instantiate construct to use names instead of positions for imports (like the explainer is suggesting in one place), then aliases, as the kind of dual for exports, should probably work analogously.

Rgr regarding projections: just wanted to check that point.

Regarding using export names: I was on the edge about that too. I had thought it was vaguely "un-wasm" to use names when indices were possible. If string size were to become a binary size problem, it would probably be due equally-or-more to module types, not instantiate/alias and, if necessary, as a uniform fix we could add a string section which everywhere else could reference. It's also interesting point about instantiate/alias being sortof "duals" to imports/exports.

Remembering all of Alex's notes on the subtleties of indices vs. names, maybe it would also be simpler to implement using just names instead of indices-mapping-to-names everywhere. @alexcrichton WDYT?

At least from implementing an initial text parser for module linking I found the $i.$j syntax to be pretty easy to implement and work with. It's true that $i.$j is already a valid identifier and could clash, but the way I currently implemented is that if $i.$j is not otherwise defined in a module then you try to desugar it. This does require a "collect the defined names" pass but that's at least code-wise pretty simple (not sure about spec-wise).

For the concern about the identifier bound in multiple contexts, from the text implementation I've got so far I'm not sure how this clashes. For example this module:

(module
  (import "" (instance $i
    (export "f" (func $j))
    (export "g" (global $j i32))
  ))

  (export "f" (func $i.$j))
  (export "g" (global $i.$j))
)

gets compiled with two injected aliases. AFAIK the text format always knows what namespace an identifier is being referred within, so it's possible to do the desugaring on a per-namespace basis. Is there a concern, however, that eventually there may be some identifer contexts where a parser may not be sure what namespace is being referred to? Otherwise I'm not entirely sure what the ambiguity is that you're referring to?

For (func $i $j) syntaxes, that may not work well for many instructions. For example the text format simply has call $foo rather than call (func $foo). While call $foo $bar may make sense it can get really weird for something like memory.copy $a $b where is $a an instance with memory $b or is it copying from memory $a to memory $b?

I imagine that we could always support $i.$j.$k and if necessary we could even support $parent.$i as a special syntax. Again though I'm assuming that shadowing is fine to have and is expected, which may not be the case of what everyone is expecting.

Oh sorry forgot to responds to the indices-and-names point as well. After reading and rereading the comments here though I'm not really entirely sure what the issue or what the proposal is? Is the idea to change the binary format? Or just the text format's syntax that gets desugared later? (or maybe both?)

if $i.$j is not otherwise defined in a module then you try to desugar it.

I can see how the rule of resolving ambiguity by giving precedence to an explicit binding works in simple cases, but it is insufficient for nested projections like $i.$j.$k. Imagine a module that explicitly binds both $i and $i.$j, and where $i explicitly binds a nested $j.$k, while $i.$j explicitly binds a nested $k. What does $i.$j.$k mean then?

Of course, all this can be defined by some arbitrary rules, but I think it's preferable to steer clear of the rabbit hole of ambiguous syntax, especially in the Wasm text format, which is meant to prioritise determinedness over convenience.

I'd also argue that a piece of syntax that actually affects index allocation deserves a bit more explicit syntax than just being an ordinary identifier.

the text format simply has call $foo rather than call (func $foo)

Right, for those places we would introduce the (func _) prefix (having it everywhere would be more consistent anyway, and not having it already bit us in the past) but continue to allow omitting it as a syntactic shorthand if the index is simple.

(There already is some precedent for allowing both forms: in the typed reference proposal (ref $t) is a shorthand for (ref (type $t)).)

After reading and rereading the comments here though I'm not really entirely sure what the issue or what the proposal is?

The idea would be that aliases for projections are defined as (alias $i (func "name")). That's consistent with how Wasm identifies exports elsewhere. That change would affect both binary and text format, but would be a more significant simplification to the latter: resolution of symbolic names remains flat and doesn't need to track nested dependent environments (and e.g. avoids questions about global vs local name spacing of these). It also simplifies validators somewhat, since the order of members in module/instance signatures becomes irrelevant and need not be maintained in their representation -- which could be a benefit for type canonicalisation in an engine, for example.

Aha yeah with multi-level projections I can see where the ambiguity comes into play, thanks for explaining!

I was testing out the impact of #22 on the text parser I have, and I found there was still more complexity than I would want around handling aliases and the index of exports with name resolution, and using strings I think would solve that part because the sugar doesn't require as much processing.

So to make sure I understand, the idea is that all identifers like $foo can optionally also be found as (item $foo) where item is the type of item it references, and you can further also do (item $foo "bar" "baz") where $foo is an instance, "bar" is an exported instance from $foo, and "baz" is an exported item from "bar". Is that right?

If so I think that'd be a great change to make. I definitely found it odd in the implementation that "here's an index and a type of item" sometimes means "the index is in the module's index space" and sometimes means "the index is in the export index space".

Yup, that would be the idea. The main downside is that strings take more encoding space, but if that's an issue we could introduce some string table. But for now I wouldn't worry about that.

Ok great; sounds like we should make both these changes. (I'll plan to create a PR for these and other updates in the other concurrent issues we're discussing next week.)

Sounds good. Is the idea to make instantiate name-based at the time, and properly rid the proposal of order-significant imports/exports?

Yup!

Incidentally, I realized that writing aliases like:

(alias $f (func $instance "export"))

is the reverse of how we write:

(import "f" (func $f))

Rather, it seems like we should write:

(alias $instance "export" (func $f))

where the $instance "export" fields of an alias are analogous to the "f" field of an import.

Does that make sense?

That sounds great to me 👍

Cool, thanks. (I think @rossberg is out until the new year, so I'll forge ahead with this understanding, but happy to adjust later.)

Ok, so getting to parentouter aliases (#20), it seems like they'd maybe look like:

(alias (outer $module $def) (type $f))

But now the symmetry with imports seems weaker, so I wonder if maybe I should abandon trying to maintain that. Also, I keep coming back to the feeling that it's asymmetric to have $instance "export", not being wrapped in parens, while having (outer $module $id) wrapped in parens when these are two symmetric kinds of aliases.

From first principles, something like this seems most symmetric:

(alias (type $t1 (export $instance "T")))
(alias (type $t2 (outer $module $T)))
(alias (module $m1 (export $instance "M")))
(alias (module $m2 (outer $module $M)))

But does the inline-alias sugar become ((func (export $instance "foo")))? That seems pretty verbose. My inclination for #20 is to say that we keep the ability for $foo to simply refer directly to enclosing scopes, so we don't need the export/outer distinction in the sugar and we can stay with (func $instance "foo"), as suggested above. But I wonder if that's too irregular? 🤷

What you're proposing for the updated syntax makes sense to me. A foray into the implementation in bytecodealliance/wasm-tools#176 left me feeling (alias 0 "foo" (func)) was pretty alien syntax, whereas (alias (func (export 0 "func"))) seems a bit clearer.

In thinking about this though I personally warmed up quite a lot to a syntactic construct that implied sugar, rather than implicitly pulling in $foo from the parent, for example. Selfishly it makes the text parser much easier to understand but I also found it was easy to forget that something was referenced from the parent or reference something from there by accident.

For the shorthand syntax, I liked the (func $instance "export") suggestion personally, with something like (type outer $module $modules_name) for parent aliases? I don't feel too compelled to a have too much symmetry here personally.

Ok, thanks for the feedback. So then perhaps these as the explicit aliases:

(alias (type $t1 $instance "T"))
(alias (type $t2 outer $module $T))
(alias (module $m1 $instance "M"))
(alias (module $m2 outer $module $M))

and then, as the inline sugar, (func $instance "export") and (type outer $module $type)?

Sounds and looks reasonable to me!

Fixed with #26