emscripten-core/emscripten

EXPORT_ES6: Add testing and support for node.js (replace require(), __dirname)

curiousdannii opened this issue Β· 59 comments

So I'm finally getting around to trying EXPORT_ES6 mode, and it just doesn't work in node (when loaded as an actual ES6 Module), because it still refers to __dirname. I assume other people have only been using it Webpack/rollup/whatever?

It doesn't look like there are any tests for EXPORT_ES6 either.

I think we don't have tests for it because at the time there wasn't a version of node that supported it. If there is now, we should add some.

I think we have 12.10 on CI, and can maybe add another version. Another option is to use v8, which we already install a new version of, if it has ES6 module support.

It would be good if someone could write up a test for this and then we can check the options.

Node 12.10 will definitely be enough.

Just FYI, it looks like out current version of node that we use in emsdk and in all our testing is 12.18.1.

jvail commented

Could this issue be extended to cover also "require" vs "import"? I wonder if replacing all lines in the output such as

var nodeFS = require('fs');`

with

#if EXPORT_ES6
const { default: nodeFS } = await import('fs');
#else
var nodeFS = require('fs');

would be sufficient.

Unfortunately my insight in the build system is very limited. But if someone could start then I could help at least with testing.

Realised this is going to be difficult to handle for multi-environment builds. ES import statements can't be put in conditionals, and while the import function can, it needs to be handled async.

ES import statements can't be put in conditionals, and while the import function can, it needs to be handled async.

They don't need to, Emscripten could emit static import declarations instead.

I said it would be difficult for multi-environment builds. A static import declaration for node will fail in the browser.

Ah, right, I see. In that case, the async import sounds fine. The fact that it's async shouldn't be an issue - Emscripten already has a set of async runtime dependencies and a way to wait on them.

Take a look at usages of addRunDependency from

function addRunDependency(id) {
.

@RReverser You mentioned working on this in #13571 , do you have a test case that you were using?

I said I stopped working on it and just waiting for someone else to fix it :)

do you have a test case that you were using?

Absolutely any file in EXPORT_ES6 mode. Try to generate emcc temp.c -o temp.mjs (it will export ES6 module) and then

import Module from './temp.mjs';
Module();

somewhere else. You'll run into the code trying to use __dirname.

The fix for just one configuration wouldn't be hard, but the general issue is that scriptDirectory is assigned from multiple places under different modes, and there needs to be some more systematic fix to align them correctly.

What is the best workaround for this at the moment ? Not using EXPORT_ES6 ?

Yes, just don't use EXPORT_ES6. If you're targeting Node that won't be a problem, even if the rest of your app uses ES modules you'll be able to make it work (possibly by changing the file extension to .cjs, I can't remember the details.)

@lovasoa I still use EXPORT_ES6, I just have a post-build script that removes the commonjs-isms. I do wind up producing separate Node.js and Browser entry points as a result though.

@matthewp I'm using EXPORT_ES6 as well, but I haven't figure out a way to get it compiled. I am able to get pass the __dirname error, but then it complained about require.
I'm curious how you did it. Can you point me to resources you used or the github repo with a working sample? I'm fairly new to this. many thanks in advanced :)

surma commented

In case it’s helpful, this has been my workaround for making EXPORT_ES6 files work in Node.js:

import {dirname} from "path";
globalThis.__dirname = dirname(import.meta.url);
import { createRequire } from 'module';
globalThis.require = createRequire(import.meta.url);

import emscriptenModule from "./my-emscripten-module.js";
// ... business as usual

@yhung119 I just use sed to find and replace the require calls with references to path, etc. and then append appropriate import statements to the top. It does mean that I have separate JS files for Node and Browser which isn't great. Perhaps there is a better way. You can see what I'm doing here: https://github.com/lucydsl/liblucy/blob/77fb82fa6c72a3621de60da61c9d972da0659e6e/Makefile#L19-L26

Here's smaller example of roughly the workaround described by @matthewp above along with a minimal repro (make node-repro) of the issue. https://github.com/thomasballinger/embind-ES6-for-node-and-browser

It looks like this is the best we have to a tracking issue for the general problems with the current EXPORT_ES6. See also the linked issues, and also some relevant discussion in WebAssembly/binaryen#3304 (comment)

Following that, some offline discussion raised the following ideas:

  • For proper ES6 module support, we don't want MODULARIZE mode - we don't need a factory. We just want to emit the code without a factory, directly - being emitted in an ES6 module file takes care of the code being "modular".
  • It seems hard to change the current relevant flags without breaking changes, EXPORT_ES6 and MODULARIZE, and we used to have MODULARIZE_INSTANCE.
  • Instead, perhaps we can deprecate EXPORT_ES6, and add a new flag ESM (or ES_MODULE or such). The goal would be for that to emit exactly the right contents for an ES6 module: no factory, the exports are the expected ones, etc.
  • When that is stable, we could remove the deprecated EXPORT_ES6.

Thoughts?

It looks like this is the best we have to a tracking issue for the general problems with the current EXPORT_ES6.

Hmm, I wouldn't mix them together tbh, but instead raise a new issue if there isn't one for ES6 already. This issue is mostly about Node.js support which has its own ES6+Node.js specific challenges regardless of chosen output format.

It looks like this is the best we have to a tracking issue for the general problems with the current EXPORT_ES6.

Hmm, I wouldn't mix them together tbh, but instead raise a new issue if there isn't one for ES6 already. This issue is mostly about Node.js support which has its own ES6+Node.js specific challenges regardless of chosen output format.

I agree we can/should probably make a seperate issue. I am a little confused by the above statement thought. Isn't this issue all about module? The use of require() and __dirname are perfect fine under normal non-module usage aren't they?

Isn't this issue all about module? The use of require() and __dirname are perfect fine under normal non-module usage aren't they?

Right, but what I'm saying is those issues will happen both in factory output and in output with individual exports, so those different output modes are not really relevant to this Node.js-specific issue.

Factory mode is still important! Why do you think it isn't desired anymore?

@curiousdannii

I should have said that factory mode should be optional. Enabling MODULARIZE could still create the factory function as always.

In #9325 I suggested

What if we deprecated MODULARIZE in favour of a new setting named something like OUTPUT_MODE with options 'plain', 'factory', 'instance'

Although now that I think about it, 'plain' is 'instance? Maybe it would be better to just have a FACTORY setting. Or just keep MODULARIZE. It's kind of a needless rename, but it would communicate so much better what it's purpose is.

Likewise we could have a MODULE_FORMAT setting with options 'none', 'esm', 'umd'. Perhaps the 'none' and 'umd' options could automatically wrap in an IIFE?

These are orthogonal options, so an option that mixes them like your proposed ESM option makes less sense to me. It's just one out of the 6 potential combinations. And if we change the default to 'esm', then it will be even less necessary.

Are you still thinking of implementing #12203?

Oh, good points. I guess I'm not sure what the best options here would be. Perhaps we could rename MODULARIZE to FACTORY as you said. Then specifying EXPORT_ES6 with or without FACTORY would indicate the mode. But if umd is important than a new setting might make sense - I'm not close enough to the JS ecosystem to know what's best here.

I haven't had time for #12203 myself, I'm afraid, due to other urgent stuff, but maybe I will eventually. Would be great if someone else had time though.

UMD is the current default if EXPORT_ES6 isn't specified. A 'none' option would just be slightly less code, and shouldn't be combined with MODULARIZE/FACTORY (because there'd be no way to access the factory function.) However currently the only way to wrap it all in an IIFE is with MODULARIZE, and why MODULARIZE_INSTANCE exists - it's basically just for wrapping, undoing almost all the work that MODULARIZE did πŸ˜†. That would be why renaming these settings would be clearer - it would separate concerns and communicate more clearly about them.

Can you explain the use case for "wrap it all in an IIFE"? If its not a super common use case could this IFFE be injected with --extern-pre-js and --extern-post-js?

The main reason would be that if you load it in a regular web page all the variables won't litter global scope. I'd say it's a very common pattern, and it's only 16 characters. I can't think of any reason not to include it by default for any non ESM output. (ES modules don't need it as they won't litter global scope.)

I see, that makes sense. In that case I wonder if we can/should try to make that the default behaviour? Are there any circumstances where one would want to clutter up the top level namespace?

There might be someone out there who accesses internal variables or functions directly instead of exporting them. But that's such an anti-pattern I don't think it should be a concern to us. And if needed they could use --extern-post-js.

dsyer commented

Here's another workaround, or maybe the same one as by @surma (above), but a more complete example that works in Node.js and in the browser. Starting with this:

$ emcc -Os -s EXPORTED_FUNCTIONS="['_doSomething','_main']" -s EXPORT_ES6=1 bar.c -o bar.js

I have foo.js (boilerplate) that loads bar.js (the generated code)

if (typeof fetch === 'undefined') {
  await import('path').then(path => globalThis.__dirname = path.dirname(import.meta.url));
  await import('module').then(module => globalThis.require = module.createRequire(import.meta.url));
}

import { default as wasm } from './bar.js';
export default await wasm({ locateFile: function (path) { return './' + path; } });

In the browser:

<html>

<body>
	<script type="importmap">
		{
			"imports": {
				"foo": "./foo.js"
			}
		}
	</script>
	<script type="module">
		import bar from 'foo';
		bar._doSomething();
	</script>
</body>

</html>

and in Node.js:

$ node
Welcome to Node.js v16.13.1.
Type ".help" for more information.
> var bar = (await import("./foo.js")).default
Starting...
undefined
> bar._doSomething()

Others more expert than me in JavaScript can probably improve on that, but it works at least for my simple example. Some ideas:

  • You could remove the hardcoded foo.js for instance by using a naming convention based on import.meta.url.
  • The hack above with the custom locateFile might not be needed with latest Emscripten (I only had 3.1.2 and older to test): #15765 doesn't seem to be in "latest" fetched from emsdk.

Is there any good advice on React?

Any update on this? I appreciate the workarounds, they've been working for me locally but I don't see a good way to extend this generically to users of my library given that some users will run in web and some will run in node. I'd basically have to offload these workarounds onto them, as best as I can see. Would prefer not to do that!

Edit: what exactly does the exported module use __dirname/require for? It seems to want to find things on disk (requiring path and fs), which are not even features I want to enable. Is it possible to set some other flag to say "don't worry about anything on the disk" that will convince emscripten not to output these in my bundle?

Edit 2: It seems that if I simply replace __dirname with "." in my bundle then everything is fine; I can polyfill the require() calls with noops and they never get hit. But I don't understand the implications of these changes; what functionality am I even risking by clobbering these things?

Thanks

You can see here the various environment features Emscripten uses: #12203 (comment)

@curiousdannii thanks, I'm curious in that case if it's possible to set the right set of compile flags to build a result which doesn't reference path, __dirname, or fs at all? I have -s FILESYSTEM=0 but that doesn't seem to do the trick.

-sENVIRONMENT=web will exclude all Node inclusions. But if you want to use the one JS output in both browser and Node you'll have to manually load the wasm binary.

What I really want is an ES6 module default export function that lets me slam the bytes in there, and doesn't try to manage getting the wasm bytes. Web & node work the same, if you already have the bytes.

For web I could do it like this, if mymod.js was generated this way:

import mymod from './mymod.js'
const MyMod = mymod(await fetch('mymod.wasm').then(r => r.arrayBuffer()))

In node I could do it like this:

import mymod from './mymod.js'
import { readFile } from 'fs/promises'
const MyMod = mymod(await readFile('mymod.wasm'))

for both, I could embed the bytes in the JS file, and it would work great:

import mymod from './mymod.js'
const wasmBytes = new Uint8Array([
 // bytes from a tool like xxd
])

export default mymod(wasmBytes)

As it is now, I have to go into the generated code and gut all the "find the wasm bytes" code in the entrypoint.

Web & node work the same, if you already have the bytes.

They really don't; the generated JS bindings are quite different, especially for things like filesystem support but also others.

They can work, depending on what you do with it. I use ENVIRONMENT=web for both browser and Node in my project.

They really don't; the generated JS bindings are quite different, especially for things like filesystem support but also others.

They do, though, as my examples show. I don't mean the existing generated bindings, I mean improving them (as I show in those 2 examples, they work the same.)

I have a concrete example. The generated js wrapper had all this stuff to download bytes if on web or read a file in node. My solution was to just gut all that, and inline the bytes in the file. Now, it will work on the web, with any web location that doesn't need to be managed, or in a node native AudioContext (polyfilled in C.) if the entry-point itself could take some bytes and just instantiate that, i could manage it however I want (with fetch, read a file, or even just inline the bytes somewhere, and pass them.

It would be cool if I didn't have to do this by hand, is my point. And if you look at the generated js, it's still got a ton of unneeded things to "get the bytes" that I just haven't bothered removing (but they do nothing.)

@konsumer You can pass in the file as wasmBinary to the Module constructor. Is that enough for your needs?

I am doing that in the linked js. I would prefer if I had an option to not include all the (dead) code that can "get the bytes". I am commenting here, because I am using es6 modules, and manually inlining the bytes and killing the code that uses __dirname is an ok solution to the problem of the generated code not working, in my case. If I was using a bundler there, I could do it some other way (like import the wasm file in esbuild, and set it to binary) and then pass that, and it'd be fine, without any "get the bytes" code on the web or in node.

Maybe I am missing something, but I don't see a way to say "generate all the emscripten wrapping and type converters and stuff, and export it as es6 module, but don't worry about the bytes, I can give that to the module, myself".

I can definitely do that by hand with wasmBinary (as I do in the linked js) but it was a super-big pain to wade through, and just exporting a plain function that is looking for those bytes would basically cover 100% of my usecases (es6 libraries that are used on the web or node or deno or bun or whatever.)

For context, I am making a game library that uses wasm for the code that makes the game work (similar to wasm4.) When I compile games to wasm in other languages/ecosystems (like assmeblyscript or rust or nim) there is 0 glue code, just basic type-converters and I hand a function some wasm-bytes. This works the same in hosts in C, node, and the web. Here is an example web-based runtime.

You might like to subscribe to #12184, it seems to be what you're after.

Btw, this issue shouldn't be tagged good-first-bug, because fixing it will require a substantial overhaul of how Emscripten sets up its imports and organises its startup code. There are several ways it can be accomplished, but it really needs direction from one of the project leads. I think #12203 is the main thing that needs to happen.

@curiousdannii I appreciate the link, that does seem to be on at least a similar track. I think maybe a difference is that I want a plain ES6-exported function, that doesn't rely on injecting globals or whatever, that just exports what you need to use the wasm-blob, but you hand it the byte-blob. I'd be happy to make a template for that, I think it's more of a reduction from what there already is, not really an addition.

that doesn't rely on injecting globals

I'm not really sure what you're referring to there, sorry. Nothing in that issue is talking about injecting globals?

I just mean, again, if you look at the linked js, at the very end I do

export { writeAsciiToMemory, UTF8ToString }

so it's possible to actually use the library.

Otherwise it's relying on the fact that ES5 on web has a default window global-scope. ES6 doesn't work like this (and neither does node commonjs) so it won't work like that, but exporting them does.

You should be listing those in EXPORTED_RUNTIME_METHODS so then they'll be exposed on the Module.

+1 for getting this solved. As a WASM library author, it's key that the library work transparently on both browser and server (nodeJS also simplifies CI testing considerably). As is I'll just publish as a CommonJS module, since that works, but I'd prefer to move to a modern ES6 module.

This will be fixed with PR #17915, see #17915 (comment) for the emsdk install instructions to test this.

I don't think this should be closed yet as I don't think web+node builds will work when the output is put through a bundler. #17915 was great progress but it's not enough.

as I don't think web+node builds will work when the output is put through a bundler

Did you try / run into specific issues? As far as I can tell from my usage, it does work, but you need to tweak the bundler config obviously to either ignore or polyfill Node modules (I chose ignoring as it's a better option to save code size, but either one should work).

But then, you have to do that either way. I suggest to track any bundler-specific improvements in separate issues. Native ES6 works now in both Node and browsers, and that's the primary goal.

I haven't tested it. But for it to continue working a bundler would have to be able to output require()s when in ESM output mode, and I've never heard of one that does.

And no, users shouldn't need to configure their bundler beyond selecting ESM output. We need to move to dynamic import().

But for it to continue working a bundler would have to be able to output require()s when in ESM output mode, and I've never heard of one that does.

Hm, why would you want require in your output? Are we both talking about bundling from Node.js code for the browser?

dasa commented

Could this help? https://github.com/rollup/plugins/tree/master/packages/commonjs#transformmixedesmodules

I agree that ideally the JS file should still workin in Node and the browser after being bundled by Rollup, Webpack, etc.

Hm, why would you want require in your output? Are we both talking about bundling from Node.js code for the browser?

Because that's how #17915 works. It will work if you just use the output as is, but if you try to bundle it the bundler will convert the require calls into imports and then it won't work in both web and node.

Could this help? https://github.com/rollup/plugins/tree/master/packages/commonjs#transformmixedesmodules

I agree that ideally the JS file should still workin in Node and the browser after being bundled by Rollup, Webpack, etc.

Oh, interesting. So yes, there is at least one bundler that could handle it.

rojvv commented

Yes, dynamic import() should be used.

image

dasa commented

Could the new Node API process.getBuiltinModule(id) help with this by removing the need to use require or import('module') to load the "fs" and "path" modules.

Doc: https://nodejs.org/docs/latest-v20.x/api/process.html#processgetbuiltinmoduleid

Maybe the code could use this when the min Node version is 20.16 or greater.