WebAssembly/module-linking

allow parent functions to be passed to nested instances?

Closed this issue · 5 comments

Currently, the explainer disallows nested instances from importing any of the parent module's function, memory, table or global definitions. E.g., you can't write:

(module $M
  (func $foo ...)
  (instance $j (instantiate ... (func $func))))

The reason is that $foo closes over $M's instance which is created after the $j instance. With memories, tables and non-immutable-ref.func-initialized globals, there is no such problem, so we could perhaps allow them (which feels a bit irregular, which is why I didn't put this in the initial explainer writeup... but we could).

There is a coherent way that could allow functions too, though, which would make the instantiation rules nicely uniform. But I'm not convinced it's a good idea; I mostly just wanted to post the idea for discussion and future reference.

The observation is that there's only a problem in $M if $j calls $foo during $j's start function. But calling imports from start functions is generally a bad idea; start functions should only do instance-internal stuff like setting up memory/tables/etc. So what if $foo was initially created in a state with no instance, such that calling $foo traps, and once $M's instance is created, $foo is updated to be a proper callable funcinst. Then $foo could be passed to $j as above, and as long as $j didn't call $foo during its start function, everything would be peachy.

Incidentally, functions are already set up in the spec to be "stateful" in this manner, since a funcaddr is the address of a funcinst in the store, which means that the funcinsts can technically be mutated just like memories/tables/globals (not that we should, but we could ;).

Performance/complexity is obviously a concern, but I think this could be pretty simple and cheap. Basically, for the small subset of functions locally statically observed to be passed into instantiate, their prologue would start with a cheap branch.

One use case would be if you have a parent module P that wants to reuse a child module C that imports callback functions as function imports (say, C is a hash table and the callback is the hash function) and P wants to supply its own callbacks. Without the above relaxation, P will have to resort to gross indirections like those in the dynamic linking cyclic dependencies example which will be significantly slower than the "branch in prologue" implementation mentioned above. But I'm not sure how compelling this use cases is?

Incidentally, with the Interface Types rebase onto Module Linking, there is a very strong use case since this situation arises with every import adapter. In particular, import adapters both need to be imported by the nested core wasm module (so they can adapt its imports) and be able to call back into core wasm (for malloc). But this relaxation could be kept to the Interface Types layer and out of core wasm, so this isn't necessarily a use case either.

Thoughts?

Hm, yeah, I'd also not convinced this is a good idea. Mutating the meaning of functions certainly is problematic for reasoning about the meaning of a program.

FWIW, functions are not currently allowed to be mutated by the spec. That is ruled out by the store extension relation. We could of course change that, but that would imply that the host suddenly gets a license to mutate Wasm functions willy-nilly as well. It's not clear how one can be allowed without the other.

If we want to allow supplying local functions (which I agree is desirable), then it seems more attractive to apply some ordering constraints. I.e., limit what a function body can forward-reference, in a way similar to what a nested module body can forward reference. For example, by allowing multiple function sections being interleaved with the others.

In general, it seems unavoidable to depart from Wasm's MVP approach, where everything in a module is mutually recursive, and move to a more restricted model where mutual recursion is limited to individual sections. It would only be natural if that extended to all kinds of definitions.

Ordering constraints are an interesting idea, but in the more complicated "outer module adapting an inner module's imports" use cases I'm thinking about, the outer module's import adapter code needs to recursively call back into the inner module's instance.

One alternative to mutating the function is to say that, when a parent function is passed to instantiate, it is permanently wrapped with a function that dynamically dispatches to a ref cell that is initially null and gets updated with the funcinst (which is roughly what the implementation would be).

Thinking about it some more, perhaps my use case can be broken down into multiple acyclic instances after all.

So my understanding of what you're suggesting is that (instance (instantiate $M (func $f))) can refer to $f preceding the instance definition in text/binary-format order? For this to successfully avoid cycles, the funcinst closure must similarly only refer to definitions preceding the same instance definition. Naively this would work by saying that the function section can be interleaved with all the other alias/instance/etc sections and that a function body is then validated against the index space as it existed at the end of its declaring function section, which requires snapshotting the size of all the index spaces at the end of each function section, symmetric to what we have to do with nested modules (but only for the type/module index spaces). However, the MVP binary format currently puts the function section before the memory/table/global sections, so the above rule would retroactively invalidate existing MVP modules. So I think we'd need a somewhat more permissive rule, e.g., allowing index spaces up to the next instance definition (or end of module).

All that seems a bit complicated, and equivalent to what you could express by introducing synthetic nested modules/instances with the current strict rules, so I'm inclined to back off this relaxation.

Yeah, reconciling this with the current semantics of full recursion among everything will be a bit ugly. Maybe being forced to introduce an auxiliary instance is good enough. We can still think about relaxing it later.

Yes, agreed we can always relax later. I'll close the issue for now though, but we can reopen if new motivations arise.