conda-forge/conda-forge.github.io

RFC: Add new target: emscripten-wasm32

Opened this issue · 0 comments

Adding a new target is a pretty big deal, but there are good reasons to do so.

Background

WASM (Web Assembly) is a powerful idea to run essentially any code in a browser on top of JavaScript. The idea has proven to be very successful because it allows essentially running arbitrary things in a browser, without the user having to explicitly install stuff.

This becomes even more powerful when combined with something like jupyter, which can serve the notebooks (and save the state), while delegating the actual computations, to the client browser, thus allowing massive scalability.

Particularities of WASM as a target:

  • no real OS
  • no real file system
  • need browser (or node.js) to run
  • separate build infrastructure (gcc -> emcc; configure -> emconfigure; make -> emmake; cmake -> emcmake)

Consequences – Overall

Since everything lives in the browser (i.e. we don't have a full-fledged OS), we need to do some extra work, like installing an environment elsewhere, and mounting this into a virtual filesystem. This especially has the consequence of needing a build/target split also or the test environment, i.e. one architecture that runs the required infrastructure for the headless browser (matching the CI agent), and one where we actually install the emscripten-wasm32 package.

Consequences – in more detail

Nomenclature:

  • in the javascript world, a "main module" roughly corresponds to what we consider a binary, and a "side module" to a shared library; more details here and here

Packaging:

  • Binaries generally come as a split between a foo.wasm file and a foo.js file, where the javascript part can roughly be considered to set up the scaffolding (virtual filesystem etc.).
  • CMake doesn't support this split, and does not copy both files natively; this needs either modifications to CMakeLists.txt to insert an if emscripten: copy... or handling this in the build script directly
  • The way node loads a binary sets up some global state (e.g. related to the allocator) that gets clobbered if several are loaded. This then results in segfaults and other weird behaviour. To counter act this, we need to compile with -sMODULARIZE=1
  • by default (or at least the most widespread), wasm is a 32bit target. To use large ints, we need to set -sWASM_BIGINT globally.

Runtime:

  • The default is to run WASM from a headless browser, but it's also possible to use node.js (which can help for testing compiled binaries in particular)
  • There's also https://wasi.dev, which users an interface to the host system to run WASM (rather than the browser), but at last check that didn't support shared libraries.

Filesystem:

  • Unlike containers, the file-system in WASM isn't "mounted", but is a one-way copy (during setup) to the JavaScript-based virtual filesystem
  • There is also a "WasmFS" in emscripten

No subprocesses:

  • By default, it's not simply possible to fork or call a subprocess. There are some work-arounds, but those need care as the javascript world does this asynchronously, while the python side is synchronous. For the time being, we should not assume that we'll be able to package things that need to execute subprocesses.

Current state

There is https://emscripten-forge.org/, which already publishes a handful of packages to a separate channel, from an open-source monorepo of recipes.

References:

Other things in this space:

Vision

  • Build as many packages as possible for emscripten-wasm32, as for any other architecture, become "regular target" in conda-forge
  • Serve 1000s of notebooks from a single server (because each consuming browser takes care of the actual execution).

Proposed Ramp-up (duration is open question)

  • Support running in all major browsers
  • Cross-compile from linux to emscripten
  • Only build for a single python version initially
  • Only publish to a specific label, not conda-forge main

Tasks