WebAssembly/module-linking

Duplication required by typing each nested module

Closed this issue · 6 comments

For the current state of the proposal, adding a nested module requires adding an entry both in the module section as well as the module code section. The module section points to a type index which forward-declares the type of the module-to-be, and then the module is later validated against this.

In implementing this proposal, however, I've found that declaring the type of the module leads to a surprisingly large amount of implementation work:

  • Perhaps the largest part is the impact on the text format - #8 - where automatically inferring the type of inline modules requires a significant amount of code to handle. One of the main gotchas here is that the type of your exports may not even be mentioned in the module declaring the export due to aliases. On the other hand, however, when writing the text format it's extremely nice not having to list out the type of each module.

  • Having a separated module definition and declaration means we need to validate the definition against the declaration, but in addition to raising questions about subtyping - #9 - this also just leads to a lot of duplication in the binary format. If the nested module lists all of its own type information the parent will have to duplicate all of it for exports/imports. This could perhaps be alleviated with alias directives for types from the parent, but that makes reusing modules as-is from elsewhere harder.

  • Writing tooling around the module linking proposal has to always handle this as well. I've so far written quite a few instances of "copy this type from this module to that module" as well as "given this module, infer its type signature and copy it over into this module". This was, for example, one of the boilerplate-y parts of writting a fuzz-case-generator. It was natural to generate a submodule, but inserting that into the outer module requires some gymnastics of extracting type signatures and moving this around.

Basiscally I've found there to be a larger-than-expected amount of effort to manage all these types and such. One idea had with @lukewagner today would be to drop the module code section and simply have a module section with inline modules. Everything else about this proposal would remain effectively the same, but this would gracefully solve #8 (you no longer need to explicitly list the type of an inline module), #9 (no type checks happen at all, the type is inferred from the definition), and issues with duplicating types in the binary format.

Given that, can anyone else think of a strong reason why nested modules' code should come after the outer module's "header info"? Engines should still be able to compile nested modules in a streaming fashion, even if they alias parent types, because the parent context is known at the time all the nested module's code arrives.

Yeah, I'm not surprised. It is a known problem with more expressive module systems that they can sometimes require significant duplication between signatures and implementations. If we can avoid having to specify the signature separately when the implementation is present anyway, then we should do so. It would seem like a simplification in multiple ways.

I seem to recall that streaming was the main motivation for the split. If you don't declare the signatures upfront, then you cannot validate a parent module before you have validated all child modules (to infer their types), so you'll lose some potential for parallelisation and increase the span -- parallel validation time becomes O(N) in the nesting depth of modules instead of O(1). But that's probably a reasonable trade-off, since N>2 is not going to be common.

You already can't compile a module's functions until you read all the submodules, but you've still got the guarantee that whatever part of the module you have you can validate and compile it, you never have to wait for a future chunk of the module to make progress, right? In that sense I'm not sure if any parallelism or streaming capabilities are lost?

You already can't compile a module's functions until you read all the submodules

Hm, actually, why not?

Oh by that I mean that in stream-order if you get one byte at a time you have to get all the submodules first (the module code section currently being before the code section), so you'll have fully read all the submodules by the time you reach your own code. This assumes though that you're receiving the wasm module as a stream, though.

While I originally proposed the Module/ModuleCode split because of some anticipated improvement in stream-ability or parallelization, that must've been based on bogus or no-longer-true assumptions because, try as I might, I can't find any concrete win from the split. (Sorry about that!) In particular:

  • Ignoring submodules, there is no meaningful win from parallelizing work before a module's Code section: everything before the Code section just defines an environment that is very cheap to decode+validate (compared to compiling a Code section) and the Code+Data section is >99% of bytes so it's not like we're wasting time by waiting for the first bytes of the Code section before parallelizing.
  • The ModuleCode section (containing submodules) currently is proposed to go before the Code section (of the parent module) because, in the most aggressive (theoretical, b/c this would be tricky to pull off) streaming+parallelization scheme, you'd want to compile the submodules before the parent so you could start executing submodule start functions while the parent was still compiling. In any less-aggressive scheme (which is what I expect all engines will actually do), the order doesn't matter: you'll compile all the code before running any start function. Thus, moving submodule definitions in-line won't change the optimal order w.r.t the parent's Code section.
  • Because of the parent-alias ordering rules, an engine has everything it needs to create the immutable module environment (in which to perform parallel compilation of submodule function bodies) at the point of the submodule declaration. Thus, there is no wasted time during which Code bytes have been received but no parallel compilation progress is being made. If anything, moving submodule Code bytes earlier in streaming order ever-so-slightly improves overall parallelization opportunities by allowing submodule compilation to start earlier than otherwise.

Thus, I think sticking submodule definitions in-line preserves all streaming+parallel optimization potential and we should do it for the significant simplifications. In the worst case, I think we could backwards-compatibly add a way to optionally declare submodules out of line—but of course we'd want to see and measure a real problem first.

Done in #23.