how about dlopen
Closed this issue · 10 comments
If a Wasm runtime is going to mimic dlopen()
, and the library name may be passed by a variable so there is no way to predict it in the loading/validation phase, the runtime needs to "patch" or "re-instantiate" the existed "moduleinst" after loading the dlopen
requirement. Or keeping multiple instantiated modules and shared them ?
My impression is that, in a wasm context, dlopen()
needs to be implemented in terms of two distinct capabilities:
- some way of turning raw bytes into a wasm module
- some way of instantiating a module by supplying its imports, receiving its exports
The JS API can achieve both of these (and implement dlopen()
) via:
- to turn bytes into a module:
new WebAssembly.Module()
,WebAssembly.compile()
,WebAssembly.compileStreaming
, and maybe one day via extension to ESM-integration - to turn a module into an instance:
new WebAssembly.Instance()
,WebAssembly.instantiate(module)
, and maybe one day via extension to ESM-integration
If you were to literally implement dlopen()
in terms of the JS API, you'd still have to figure out what "filesystem" the given path string is resolved against (it could be a URL that is fetch()
ed, a key into IndexedDB, a path into an Emscripten-virtualized filesystem, etc.), but this is all something a toolchain can implement (and should implement, given the diversity of valid options).
Outside of a JS environment, the module-linking proposal currently provides:
- to turn bytes into a module: nested
(module ...)
definitions - to turn a module into an instance:
(instance (instantiate ...))
definitions
Both of these are too static to support the full dynamism of dlopen()
, so both would need to be extended to be more dynamic, but I think in different ways in different proposals:
- to turn bytes into a module at runtime:
- core wasm could support a runtime
compile
instruction for turning bytes in linear memory into a(ref (module T))
. This effectively give core wasm the power ofnew WebAssembly.Module()
. This would offer full JIT-ability of wasm, but not all hosts will support JIT, so not all hosts would want to support this instruction. There's also a whole subtle issue with synchronous-vs-asynchronous compilation for some browsers that's hard to address in core wasm. - WASI could define interface with
compile : async function(code: stream<u8>) -> handle<Module>
, making this a host capability that modules could import instead of a core wasm instruction. On the Web, this WASI interface could be implemented in terms ofWebAssembly.compileStreaming()
. Theasync
ness would address the synchronous concern, but the not-all-hosts-want-to-support-JIT concern remains. - WASI could define various interfaces like
load : async function(path: string) -> module<T>
allowing thepath
string to be resolved against a host-defined namespace. For JIT-less hosts, this namespace could, e.g., be some plugin library of AOT-compiled modules -- the host would have a lot of flexibility here. In the context ofwasi-filesystem
, we'd want something more likeload-at : async function(handle<Directory>, path:string) -> module<T>
.
- core wasm could support a runtime
- to turn a module into an instance at runtime, my current thinking is to allow the
(instantiate ...)
instruction, which is currently only allowed in(instance ...)
definitions, to also be used as a first-class instruction at runtime. I think there's some important variations to consider here:- Adding
instantiate
as an instruction to core wasm, which produces a(ref (instance T))
. Because of mutability,instance
s can form cycles and thus in general this instruction requires the host to have a GC, which not all hosts want to do. - Adding an
instantiate
instruction to Module Linking which exclusively instantiates shared-nothing Interface-Typed components, returning an explicitly-ownedhandle
to an instance. With this restriction, instance lifetimes would be explicitly managed, not requiring GC, and allowing things like prompt release of linear memory. - With both of these instructions, there two additional options to consider:
- If the
instantiate
instruction accepts a dynamic(ref (module T))
, then the exports of the returned(ref (instance T'))
are fully dynamic, and so to call an exported function, there would need to be an accompanyingget_export
instruction that takes a(ref (instance T'))
and returns a(ref (func X->Y))
that can be called viacall_ref
. This is what's needed to express the full dynamism ofdlopen()
which, symmetrically, returns a function pointer. - The
instantiate
instruction can also take the module to instantiate as an immediate (ie.,instantiate $M
where$M
is a static index referring to a nested or imported module). This allows the exports' code to be known AOT, which allows more aggressive AOT optimization (e.g., cross-module inlining). So instead of aget_export
instruction returning a dynamic function reference, there could instead be acall_export $M $export
instruction that statically identifies the$export
to be called.
- If the
- Adding
I hope this helps give a picture of the space of options we're imagining going forwards and also why it made sense to start with just the simplest case, so we can start making incremental progress on that. To wit, this spectrum of options is illustrated in this slide from the component model presentation to the CG in April.
Super love the idea of using (ref (func X->Y))
, which helps a lot. And a new layer is definitely a doable way to improve the current linking solution, especially after involving nested module
and initiate
.
Since I noticed only func
are mentioned as an example, I am hoping to get more insides about memory
. If we are using ref instance
, I think it suggests keeping every memory and its data
separately. So how to pass memory data from a nested module to another? Currently, the best way is to use the third module or to a globally shared one memory.
Hi, great questions. With the declarative form of instantiate
(inside instance
definitions), memory
and table
definitions could be imported and exported just like functions. To enable the runtime instantiate
instruction, I think we would need first-class (ref (memory ...))
and (ref (table ..))
types so that dynamic references could be passed as arguments to instantiate
to satisfy memory
and table
imports, and also returned from get_export
for memory
and table
exports. Incidentally, having first-class memory and table references (and extending load
/store
/table.set
/table.get
etc to operate on these reference types) has been a rough plan since the early days of wasm and would enable various other advanced use cases.
What do you mean by * first-class (ref (memory ...))
? My concerns here are:
- Will
(ref (memory ...)))
bring a security risk? Unless we can separate sensitive memory data from others, to share a whole memory may mean shared everything. - All my memory/shared memory questions are from the pointer concept, like
char *strcpy(char *dest, const char *src)
andchar *strdup(const char *s)
, it involves how to pass a memory data of one module to another and how to access one module memory from another? It might not just need to indicate whose memory it is (src
anddest
instrcpy()
) but also requires code space (tellstrdup
whose memory should be use) ?
That's a good point, first-class memory references would share everything. Since we were talking about dlopen()
, I thought maybe we were talking about literally implementing dlopen()
which inherently assumes shared-everything.
For shared-nothing dynamic linking, then you're right that we wouldn't want to use first-class memory references. Instead, I think we'd want to use interface types to copy values between the memories.
strcpy
and strdup
are tricky to lift into a shared-nothing context since they assume shared-everything by passing raw pointers (to an implicit shared linear memory). There is a long-standing idea around adding first-class "slices" of memory (that you could load
and store
from) as a core wasm feature (so that strcpy
could take a slice as its dest
parameter), but this hasn't been prioritized yet since it ends up being tricky to use in practice for linear memory languages, where the ambient assumption is that all pointers are relative to the default linear memory. You'd need special compiler+language extensions to represent slices and "pointers into slices" and code would need to be specially written to use them. In any case, that would be a separate core wasm proposal; distinct from (but complementary to) module linking.
maybe we can use dataref
to open a fixed-size window on a sub-modules's instance memory. Rather than using a map to translate a reference of sub-module instance heap, like an exported function, to a reference of the main-module heap, why not share the heap space in all instances of a main-module and its sub-modules or a component?
Or maybe I misunderstand the meaning of (ref (func X->Y))
.
As core wasm features, function references (which is what I mean by (ref (func X->Y))
) are indeed meant to be used primarily in shared-everything scenarios where memory is shared between the caller and callee. Beyond that, I'm not sure exactly how what you're asking about is distinct from the "slices" idea I mentioned in my last comment.
Guess it is:
-
keep instances' linear memories separately and try to open small windows (slices) or totally open (ref memory) or copy by declarations ( interface types)
-
Or, to maintain only one linear memory and merge instances' table/data/heap into main table/data/heap.
Yes, I think that's a nice summary.
thanks. great talk.