Compile-time imports in JS-API
eqrion opened this issue · 14 comments
After the last meeting on stringref, it seemed like providing a concrete proposal for compile-time imports could aid the discussion.
Overview
The current compilation model of WebAssembly through the JS-API provides imports when instantiating a module. This prevents web runtimes from being able to assume anything about an import beyond the type given while compiling without speculation or other tricks.
This issue proposes a minimal set of changes to allow specifying certain imports while compiling a module. This must be done carefully to not break certain invariants and optimizations that web runtimes currently rely on.
This issue does not propose any values to compile-time import. I will file a separate issue to proposes some possible builtins and a process around standardizing them.
Design constraints
Modules can be shared across web workers using postMessage
Not all import values are shareable across workers. We must be able to send shareable import values across workers, and reject unshareable values.
It’s possible that we could disable sharing modules that have any compile-time imports for an initial version, but this would need to be solved in the fullness of time.
Compiled modules can be serialized to the network cache
Web engines can cache optimized code in a network cache entry keyed off of a HTTP fetch request. If the optimized code is specialized to runtime provided import values, we will need to expand the cache key to include those values. There is a risk that specializing to keys that change every page load could effectively disable code caching.
Decoding the imports section can happen on a different thread from the imports object
Parsing and compiling a module can happen on background threads which cannot perform property lookups on an imports object.
Reading from the imports object requires knowing the keys
See ‘read the imports’. Import value lookup is performed using JavaScript ‘get property’ which requires knowledge of the key you’re looking up. It’s not possible to pull all possible values from the imports object eagerly as it may be a JavaScript proxy or other exotic object which does not provide iteration over all possible keys.
We should standardize the web interfaces that can be specialized to
Specializing to an import can be critical to the runtime performance of the module. We should provide strong guarantees about when specialization happens and to which imports.
Do not conflict with future core wasm features
Do our best to not conflict with potential future wasm proposals, such as pre-imports, staged compilation, module linking, or the component model. Make minimal or no changes to the core specification.
Proposal
Add a WebIDL attribute for shareable
This attribute is to be used on WebAssembly builtins, and possibly other Web interfaces in the future. They can be used with the structured clone algorithm, and as compile-time imports.
As they are well defined for structured clone, they are valid to be sent through postMessage. We may prevent them from being stored in user-facing persistent storage, such as IndexedDB. This is the situation with modules, as well.
Modify the JS-API compilation methods to accept optional options
dictionary WebAssemblyImportValue {
required USVString module;
required USVString name;
required any value;
}
dictionary WebAssemblyCompileOptions {
optional sequence<WebAssemblyImportValue> imports;
}
interface Module {
constructor(BufferSource bytes, optional WebAssemblyCompileOptions options);
...
}
namespace WebAssembly {
Promise<Module> compile(
BufferSource bytes,
optional WebAssemblyCompileOptions options);
...
};
The imports
field is a list of import values to apply when compiling. It is not the same kind of imports object as used when instantiating, due to the above mentioned design constraints around threading and ‘get property’.
Every import key of ‘module’ and ‘name’ must be specified at most once, or else a LinkError
will be thrown. Every value provided must have the shareable
WebIDL attribute.
A module compiled with imports
extends the ‘Compile a WebAssembly module’ algorithm to check that the import values are compatible with the module. This could be expressed with a new embedding function module_validate_imports(module, externval)
which only performs import matching and does not mutate the store. Any issues are reported with a LinkError
.
The provided import values are stored in the module object. Any import provided at compile-time does not need to be provided during instantiation. The WebAssembly.Module.imports()
static method will also exclude listing these imports.
Any provided import value may be specialized to when compiling the module if the engine deems it profitable. It is expected that WebAssembly standard builtins will be guaranteed to be specialized to if they are exposed by an engine.
Because every shareable
value is valid for structured clone, the compiled module can always be sent with postMessage
. shareable
values are also expected to be safe to be cached by the browser in the network cache.
Open Questions
Should we throw LinkError for importing a non shareable
value
The above proposal throws a LinkError
if you import a non shareable
value. It could be possible to import a non shareable
value if we were to prevent that module from being sent with postMessage. Web engines with module caching would likely not specialize on those values to keep the resulting module cacheable.
Is there a better design for the compile-time imports object?
Having a different structure for the two different kinds of imports is very unfortunate.
We could use the same imports object at compile as in instantiation, but engines would be forced to do a bounce back to the originating JS thread to read the imports object when they’re compiling off the main thread. Maybe there is another option here?
Maybe there is another approach that might work: partial application.
The idea would be to apply a module partially to imports (existing imports, not new way of importing); but the result would not be an instantiated module but a new, slightly more constrained, module.
This doesn't address the issue of sharing across workers though...
Maybe there is another approach that might work: partial application. The idea would be to apply a module partially to imports (existing imports, not new way of importing); but the result would not be an instantiated module but a new, slightly more constrained, module. This doesn't address the issue of sharing across workers though...
The main issue with that is that it would prevent engines from compiling any machine code until instantiation (or some other signal that partial application is finished) if they want to specialize to an import value.
I don't see that conclusion. But, it would depend on how partial application was implemented of course.
Do not conflict with future core wasm features
Do our best to not conflict with potential future wasm proposals, such as pre-imports, staged compilation, module linking, or the component model. Make minimal or no changes to the core specification.
How do you see this design interacting with future pre-imports/staged compilation proposals? Would they be:
- Slightly different ways to achieve similar effects
- Subsumed by those future features
- Something else?
My understanding of the benefit of full-featured pre-imports is that it would allow for more than one layer of staging. A module could pre-import and pre-export to compose modules at compile-time.
This proposal only allows imports to the compile methods, it's possible with pre-imports that we would have exports on the module object as well. This would make this proposal a subset of the more general feature.
However, I don't know what the main use-case of multiple layers of staging is, so I don't know what we'd be missing from the MVP.
A similar conclusion in this related issue was that the main feature you need is guaranteed inlining for these pre-imports: WebAssembly/relaxed-simd#103
My main interest there was not relaxed-simd, but similarly to pre-import performance sensitive function calls (could also be on imported types.
A similar conclusion in this related issue was that the main feature you need is guaranteed inlining for these pre-imports: WebAssembly/relaxed-simd#103
My main interest there was not relaxed-simd, but similarly to pre-import performance sensitive function calls (could also be on imported types.
I think the key thing here is that engines would be expected (and allowed) to do the optimal thing when compile-time importing a wasm builtin (#1480). And except for rare odd circumstances, I'd expect that to involve inlining.
I think the key thing here is that engines would be expected (and allowed) to do the optimal thing when compile-time importing a wasm builtin (#1480). And except for rare odd circumstances, I'd expect that to involve inlining.
I am pro incremental approaches and I am not familiar with all the technical depths and discussions about this topic. Therefore, I will only share a potential user perspective on the topic of compile time imports and the importance of general support for inlining for this vision with a request for feedback to make sure this has been considered.
I would be interested to hear if you see your proposal in line with an incremental approach towards a more general support for inlining or if it would be orthogonal or if you see constraints that would rule out general inlining support. Apologies if this has been discussed earlier, but I think it's important enough to make sure this is considered for the future direction of WebAssembly.
While I do believe an efficient handling of strings in the near term is essential, in the future I also see a need for a more general support for inlining of imported function calls (between wasm modules or between a wasm module and its host). My hope for this is to unlock externalizing shared libraries such as standard libraries from the modules using them, in order to reduce their size (e.g. IIRC the main chunk of small Rust modules is the string standard library, but this is probably true for all languages), but it would potentially also serve the needs for importing host specific functions like JS APIs in the context of stringref or importing host specific instructions for hardware specific functions.
I strongly support this idea. As @eqrion pointed out, this has many potential use cases, far beyond strings, so would be a more general (and in some important ways, simpler) solution.
In essence, this is the simplest version of a more general staging mechanism -- and perhaps it's all we ever need. Even if not, I think it could be extended to more general staging later. The main additional feature that full staging requires, as I see it, is the ability to extract exports from a stage-applied module, so that they can be used as pre-imports to other modules. But compile-time imports could be viewed as a shorthand for a special case where this isn't needed. If names of pre-imports and regular imports cannot overlap, then I foresee no ambiguity, and we could allow compile time imports to supply either.
The only constraint here is that in the future, there generally will be dependencies between some imports. For example, with type imports, the type of a function import may depend on an imported type, and it doesn't make sense to supply the latter without the former. With staging, additional dependencies could arise. So we must be prepared to perform dependency checks on the supplied compile-time imports and raise LinkErrors if violated. I don't see a problem with that, just making people aware.
The shareable attribute is a great idea to solve the sharing problem. I would not necessarily restrict compile-time imports to shareable values, but only use the flag to define and obtain shareable modules. Linking and sharing are separate mechanisms, and there may be use cases of compile-time imports that do not require sharing the module afterwards.
Personally, I would try to bikeshed the import list differently and still use the same form of import object as instantiation, going through the property iteration protocol, and defining that only enumerable/iterable properties are considered. That may improve ergonomics, especially in conjunction with builtin name spaces (as per #1480). But that's a minor point as far as I'm concerned.
I am also strongly in favor of this. This is a much more extensible approach that allows experimentation with different types of new builtins without changing Wasm's core binary format or introducing new types. AFAICT, and as discussed elsewhere, this should generalize to Wasm type imports in the future. A key issue with imported (abstract, perhaps unconstrained) types is the engine knowing their representation in order to generate specialized code, which this direction seems like it can support.
While I like the generality of partial application as @fgmccabe suggested, in the Web APIs there is already an expectation of both the cost and meaning of each of the API calls, so retaining a point of control for applications (i.e. WebAssemble.compile
generates code and needs compile-time imports to do so) allows them to optimize their costs. That doesn't preclude having additional incremental binding APIs in the future.
It seems a bit obvious to me in retrospect, but this extension does not need to be limited to the JS-API and could also extend the C/C++ API's as well. I'm not sure what constraints exist there, but I think it would make sense to extend those as well.
The shareable attribute is a great idea to solve the sharing problem. I would not necessarily restrict compile-time imports to shareable values, but only use the flag to define and obtain shareable modules. Linking and sharing are separate mechanisms, and there may be use cases of compile-time imports that do not require sharing the module afterwards.
The other issue besides shareability across threads is serializing compiled modules (e.g. into a network cache). For the use-case of host provided builtins, those are both shareable and serializable, which led to me collapsing the two bits into one and therefore require both.
The difficulty with non-serializable imports (which are basically all import values except constant values) is that it would force engines to choose between specializing to the import or having a serializable module. It seemed better to me to only support the cases where we could both serialize the result and specialize to the import value.
Personally, I would try to bikeshed the import list differently and still use the same form of import object as instantiation, going through the property iteration protocol, and defining that only enumerable/iterable properties are considered. That may improve ergonomics, especially in conjunction with builtin name spaces (as per #1480). But that's a minor point as far as I'm concerned.
I think that only supporting enumerable properties could solve the issue, but I'm not sure if there's precedence for that in designing JavaScript API's. I'll have to ask around.
One concern I have is that it could be confusing to have two things called 'imports objects' with different techniques of being interpreted. One being property lookup, and the other iteration. Especially because there are real users out there who rely on the property lookup mechanism (using proxies). But maybe that's fine?
There is now a phase 1 proposal for this. I'm going to close this in favor of discussion on that repository.