Reflect with transitive dependencies
Jack-Works opened this issue · 6 comments
I have implemented a PoC in a Webpack plugin to support import(spec, { reflect: "module"})
. I found the current reflection is too weak to support real-world use cases.
Use case
Import the untrusted npm module as a whole to provide virtualization.
Let's say we have an untrusted npm module mod
like this:
// mod/index.js
import { assert } from './utils.js'
import { count } from './count.js'
import { readFile } from 'node:fs/promises'
In the current proposal, if we want to fully virtualize the mod
module, we need to manually collect all the transitive dependencies of mod
, for example: mod/utils.js
and mod/count.js
.
This is not practicable and with low performance (you need to reflect every module and manually link them together by the importHook
).
Currently: the importHook
will be called for ./utils.js
, ./count.js
, and node:fs/promises
Suggestion
The module reflection, by default, fetches and links all its transitive dependencies, except the module that is intrinsic (has no source and is implemented by the host, like node:fs
).
Still the same module above, with this suggestion, it will only call importHook
for all Node built-in modules. mod
and it's all its transitive dependencies are evaluated in the given Evaluator.
Peer dependencies
If this mechanism does not support opt-out, it will be not usable. Some module does not allow multiple instances (like "react") otherwise it will not work correctly. We can provide an opt-out mechanism like this.
await import("react-window", { reflect: "module", reflectExclude: ["react"] })
In this way, this module reflection will not capture react
and we need to manually provide it in the importHook
.
Today's behavior
await import("mod", { reflect: "module", reflectExclude: "*" })
Use a "*"
string instead of an array to get today's behavior. (not link anything)
Pros
- Better ergonomics, easier to adopt
- Performance gain (many modules can be linked by the engine. less user code, better performance)
Cons
- We need to introduce the
resolveHook
back again in the compartment proposal. - Reflected modules have different identities if the
exclude
options are different.
I agree that for the use case of "virtualize a graph", you want a construct that allows you to "pluck a subgraphgraph, down to an exclude boundary".
But note that this proposal is not solving that problem, and that such a solution conflicts with the primary use case of this proposal of providing the simplest primitive. Think how realms also requires wrapper code to get to the final use case.
In addition, in the Wasm case, having any linking on by default would break the Wasm use case being solved by this proposal.
There are also implementation cache key questions with this approach in that there is no longer a clear cache key associated with the reflection.
I would recommend, treating your PoC as library code that wraps these primitives, as opposed to changing the primitives. Compartments could possibly also offer further solutions here as part of its API.
treating your PoC as library code that wraps these primitives
it is not viable when implementing it in a bundler because if it is library code, then we need to reflect the transitive dependencies in the runtime, which requires import(unknown_expression_here, { reflect: "module" })
(and link them manually). This is not static analyzable.
Why do you need it to be statically analyzable? Loaders never have been statically analyzable.
In addition, in the Wasm case, having any linking on by default would break the Wasm use case being solved by this proposal.
Opt-in is also okay for me, but can you explain why?
There are also implementation cache key questions with this approach in that there is no longer a clear cache key associated with the reflection.
They're still reflecting the same underlying source, like reflecting it as a ModuleSource or reflecting it as a string (if we're going to have it).
===================
I gathered my thoughts and this is what I was thinking about:
- I need to virtualize a graph for security
- I'm using Webpack so I'm writing a Webpack plugin
- There are no
node_modules
in the runtime so I need to do some static analyzing. - I don't want to implement something unlikely to be standardized in the future. Therefore if the proposed change in this issue is unacceptable, I have to change how I solve the problem.
If I have to write import('some-module', { reflect: "module" })
with the current semantics (no transitive dependencies included), a possible solution is to compile all files that some-module
refers to (in any depth) into a ModuleSource
and store them in a Map
.
import('some-module', { reflect: "module" })
// globalThis.#reflectedModules: Map<[["some-module", ...], ["some-module/lib/index.js", ...]]>
Then in the user land library, when I need to link it in the importHook
, I can use import(unknownSpec, { reflect: "module" })
because it is already in the globalThis.#reflectedModules
(in the bundler case) or just available (in the native implementation).
But this method brings another problem. This way requires me to treat all CommonJS files as a ESM file. Even if I use cjs-module-lexer to analyze and sythentic a ESM module for it, it will still have CJS/ESM interportion problem.
Currently I'm stucked here. If we can have reflected transitive dependencies in this proposal, when the graph looks like this:
(a) reflected ESM => (b) CommonJS => (c) CommonJS
We don't need to reflect (b) and (c) as a ModuleSource, and it can be executed in a host provided CommonJS runtime (uhh how importHook
should interact with CommonJS
in NodeJS)
What you're describing is something like a "module closure" where the module has partial binding to other modules, which in turn have partial binding. This is a very complex concept (just like closures are for functions). WebAssembly module linking has no concept of partial linking (and in fact WebAssembly doesn't even separate linking from execution). This spec is a simple primitive designed to enable the use case (which isn't possible today), not solve the advanced linkage scenarios you describe. I agree those are problems, but they simply don't fall under the use cases or requirements of this proposal at all. I think you should bring these points up under the compartments spec instead.
I solved how to reflect CommonJS modules, so I believe there is no more problem with this.