WebAssembly/module-linking

Intermingled import/type sections and elaboration to single-level instance imports

Closed this issue · 5 comments

One thought I had over the weekend was that I'm not sure the proposed import elaboration rule will work for necessarily all module linking modules. For example:

(module
  (import "" "" (func))
  (type $t (func))
  (import "" "a" (func (type $t)))
)

if we were to blindly elaborate this:

(module
  (import "" (instance
    (export "" (func))
    (export "a" (func (type $t)))
  )
  (type $t (func))
)

Which would be invalid at least in the binary, and I think there's also thorny questions with if we're importing types and referring to them things get weird.

This leads me to the question of what's the expected layer for when this validation happens? Is it imagined that we'll handle this at the binary decoding layer in the spec? Elsewhere? It seems like handling it at the binary encoding layer would be difficult because it would affect the index spaces, so is this perhaps just some extra validation on the names of modules and otherwise how the directives are interpreted for name-based instantiation?

If we just consider the binary format, then this actually is a non-issue, since all types are currently defined in it before any import. And with the extension to multiple sections, we don't need to allow splitting multiple imports from the same module name across multiple sections. In other words, make a module malformed if it does that.

But there is a general problem with the text format: How do we represent multiple sections in text? Currently, all declarations of a given kind can be arbitrarily split and interleaved with others, and the meaning is that they all will be concatenated into a single section. The interleaving of declarations of different kind does not affect meaning. Under that interpretation, your example would still be fine.

However, do we want to keep that interpretation for backwards compatibility? Or change the meaning of text format to express multiple sections with an ordering in a natural fashion? As your example shows, the latter could be a breaking change to the text format, depending on choices about semantics.

The alternative would be to introduce explicit text notation for expressing section boundaries or relative orderings in some form. Unfortunately, I have no good idea for such a notation.

Ah yes that's a good point that with current wasm we can finagle our way out of this, but I'm still a bit worried about how this interacts with module linking itself.

I think there'd still be a possible index space clash with something like:

(module
  (import "" "" (func)) ;; func 0
  (import "a" (instance)) ;; instance 0
)

where this gets elaborated (ignoring type indices) to:

(module
  (import "" (instance (export "" (func)))) ;; instance 0
  (alias (instance 0) (func 0)) ;; func 0
  (import "a" (instance)) ;; instance 1 -- this changed
)

By desugaring into instance imports, we're altering the instance index space?

My current interpretation of the text format is that all sections which can be interleaved are encoded in the order they appear in the textual module, with adjacent items of the same kind coalesced into the same section. This is definitely a breaking change for the text format, however, since this module works just fine today:

(module
  (import "" "" (func (type 0)))
  (type (func))
)

Currently the text parser I'm working on works around this by using an MVP-encoding where types are listed first if no aliases/instances/modules are found. I think this is somewhat related to WebAssembly/annotations#11 as well perhaps in that it's likely best if we figure out a new kind of text format for these interleaved sections rather than trying to bolt this onto the existing text format.

Desugaring, by definition, is a shorthand that is supposed to be a macro-like expansion, and not affect other parts of the program. So other indexing already ought to take this expansion into account, not be changed by it. That is, in your first example, the first import does bind an instance in both forms, so the second import always defines instance 1, regardless of which notation is used to express instance 0, sugared or desugared. If somebody is using raw indices (as opposed to symbolic), they simply have to understand the implied meaning of any sugar.

I think this is somewhat related to WebAssembly/annotations#11

I see what you mean. FWIW, the positioning scheme proposed there is no longer sufficient when we allow duplicate sections. It probably needs to number the occurrences of the respective sections.

It would be rather annoying if the text format had to require complicated section descriptions. Hopefully we can find something that is at least optional and has some natural (yet backwards-compatible) defaults when omitted.

Ah ok that makes sense! I was thinking that the modification of the index spaces wasn't intended, but giving all existing modules pseudo-instances makes sense to me!

As one bad-strawman syntax for text, we could say something like:

All bare single-item type and import statements are sorted to the front of the module (as they are today), and for the new sections we do something like:

(module
 (imports
    (import "" (instance))
  )
  (aliases
    (alias (instance 0) (func 0))
  )
  ;; ...
)

and if we wanted to get fancy we could do:

(module
 (imports $foo
    (import "" (instance))
  )
  (@custom "x" (before $foo) "...")
)

I believe this has since been clarified in the text, so I think this is resolved.