Ocaml to WASM - an Overview

This is a collection of links and ideas and notes on compiling OCaml to WASM.

General Notes

3- A language implementation like OCaml breaks down in four big parts:
        1- Front-end compiler
        2- Back-end compiler and code emitter
        3- Run-time system
        4- OS interface

[...]

6- Here is a schematic of the Caml compiler.  (Use a fixed-width font.)

             |
             | parsing and preprocessing
             v
          Parsetree (untyped AST)
             |
             | type inference and checking
             v
          Typedtree (type-annotated AST)
             |
             | pattern-matching compilation, elimination of modules, classes
             v
          Lambda
           /  \
          /    \ closure conversion, inlining, uncurrying,
         v      \  data representation strategy
      Bytecode   \
                  \
                 Cmm
                  |
                  | code generation
                  v
               Assembly code

A more recent high-level view of the compilation pipeline (from https://ocamllabs.slack.com/archives/C0JCHGE78/p1568626615023800, Sep 16, 2019):

 Source code
   |
   | parsing
   v
 Parsetree
   |
   | typing
   v
 Typedtree
   |
   | desugar pattern matching, modules, objects, etc; erase types,
   | make explicit memory layout in terms of blocks and values
   |
   v
 Lambda (higher order lambda calculus based IR)
   |
   | make closure construction and usage explicit
   | perform inlining
   |
   v
 Clambda (like Lambda but with explicit closures, direct/indirect calls)
   |
   | make block/value manipulation explicit
   | make allocation explicit
   |
   v
  Cmm (tree-structured, explicit memory manipulation, C calls, etc)
   |
   | perform instruction selection,
   | sequentialization into basic blocks,
   | assignment of pseudo-registers
   |
   v
  Mach (block structured IR)
   |
   | liveness, register allocation, dead code elimination
   | are Mach -> Mach transformations
   |
   v
 Linear (linear sequence of abstract assembly instructions, explicit register assignments)
   |
   | this step is heavily backend-dependent, implemented in `emit.mlp`
   |
   v
 Textual assembly code

Runtime / Garbage Collection

Both OCaml bytecode and OCaml native code comes with a runtime that provides functions needed to run the compiled program.

The runtime provides
  • a copying garbage collector, the caml_alloc-functions

  • the caml_apply-functions that implement curried function application

  • binary and unary operations on tagged integers/floats

  • exception handling

TODO: find out what else the runtime does.

Dealing with OCaml values' lifetimes in WASM
  • gc1) "Manual" garbage collection on the WASM linear memory, by maintaining a stack in WASM linear memory and porting the garbage collector to WASM

    This is apparently what the WASM backend of the Go language does.

  • gc2) Use the WASM garbage collector extension to allocate and manage OCaml values

    The WASM garbage collector specification is still at a very early stage, and it is not clear how exactly it will work. The only reasonable way, at this point in time, to attempt this is to prototype our own WASM GC extension.

    What if, while browser support is not there yet, we could compile to WASM+GC and then compile WASM+GC to WASM?

  • gc3) Allocate heap objects on "the JavaScript side of the world" via Reference Types

    This is more or less a work-around for not having a WASM GC extension.

  • gc4) Create a version of OCaml that has static lifetimes, similar to Rust.

    I will not do this, since pretty much all existing OCaml code would need to be rewritten in order to be compiled to WASM with this variant of the language. Also, this may be different enough to OCaml that this is essentially a new programming language.

Paths to WASM

Direct
  • 1a) translate Lambda → WASM

  • 1b) translate Cmm → WASM

  • 1c) translate OCaml bytecode → WASM

  • 1d) run a bytecode interpreter for OCaml in WASM

Indirect
  • 2a) Cmm → LLVM → WASM

  • 2b) OCaml bytecode → LLVM → WASM

  • 2c) Ocaml → machine code → WASM

Direct Roads to WASM

1a) Lambda → WASM

While there are currently no projects that translate OCaml’s lambda IR to WASM, there are these:

1b) Cmm → WASM

Starting from an already optimized version of the program is likely to result in a comparatively fast execution speed.

Generally, it appears that Cmm is a good starting point when compiling to WASM without using the WASM GC extension, since the memory representation has already been flattened at the Cmm stage.

1c) OCaml bytecode → WASM

I am not aware of any projects that attempt translating from OCaml bytecode to WASM. Please let me know if you are.

An advantage is that the bytecode interpreter hardly ever changes at all (it is said to still be quite similar to what is laid out in the original report on ZINC).

There is no dependency on compiler internals, as we can work on the bytecode output of ocamlc.

In the past, translating bytecode has proven to be a successful and maintainable strategy for compiling OCaml to different languages:

  • [production-ready] OCaml bytecode → JavaScript: https://github.com/ocsigen/js_of_ocaml

    https://www.irif.fr/~balat/publications/vouillon_balat-js_of_ocaml.pdf presents performance results from 2011: The code generated by js_of_ocaml running on the V8 JavaScript engine was faster than running the bytecode interpreter on the bytecode generated by ocamlc, and slower than the machine code generated by ocamlopt. Exceptions turned out to be very expensive.

    js_of_ocaml is being used in production systems, as far as I know, it is currently the best tool to compile OCaml to JavaScript.

    OCaml values are allocated on the JavaScript heap (gc3), thus, the calls to the garbage collector are just stubs: https://github.com/ocsigen/js_of_ocaml/blob/e7a34b8e0697a34b235ff121132c72121c16798d/runtime/gc.js

    Note: It is unlikely, that exceptions will be an issue when compiling to WASM, since the exception mechanism in WASM will be different from the one in JavaScript.

  • [inactive] Ocaml bytecode → C: https://github.com/bvaugon/ocamlcc

    According to http://michel.mauny.net/data/papers/mauny-vaugon-ocamlcc-oud2012.pdf, performance in 2012 was better than running the bytecode interpreter, and worse than running the machine code generated by ocamlopt, which essentially was to be expected. However, this comes at the cost of having large executables, roughly up to twice the size of machine code in the considered examples.

    I managed to compile this using an older version of the OCaml compiler.

    I can compile trivial test programs to C.

    Compiling that C code using Emscripten to WASM, I am stuck with this error on the JavaScript console:

    exception thrown: RuntimeError: index out of bounds,_caml_page_table_modify@http://127.0.0.1:8000/output.js:45026:1
    _caml_page_table_add@http://127.0.0.1:8000/output.js:44203:1
    _caml_set_minor_heap_size@http://127.0.0.1:8000/output.js:89253:1
    _caml_init_gc@http://127.0.0.1:8000/output.js:90849:1
    _caml_main@http://127.0.0.1:8000/output.js:99291:1
    _main@http://127.0.0.1:8000/output.js:110038:1
    Module._main@http://127.0.0.1:8000/output.js:6717:10
    callMain@http://127.0.0.1:8000/output.js:7005:15
    doRun@http://127.0.0.1:8000/output.js:7064:23
    run/<@http://127.0.0.1:8000/output.js:7075:7

    I’m having trouble debugging this because I don’t have source maps for the C files where the _caml_-functions come from. The reason seems to be that the files aren’t actually included, only the headers. So I need to figure out what parameters to provide to emcc. In order to do that, I need to figure out what parameters ocamlcc uses to compile the code with gcc.

    I was able to get the parameters from ocamlcc by using the -verbose option, now the error is this:

    shared:ERROR: emcc: cannot find library "curses"

    While I could continue here, I think that this is a dead end due to the large code size.

1d) bytecode interpreter in WASM

Indirect Roads to WASM

If there was a compiler from OCaml to LLVM, it would immediately enable compilation to WASM.

2b) OCaml bytecode → LLVM

2c) machine code → WASM

For compiling machine code to WASM, there apparently do not currently exist any solutions.

One would need to apply some kind of algorithm that transforms the control flow from a program-counter-based representation to the labeled continuations that can be seen in WASM, just like Emscripten’s "Relooper" algorithm does for LLVM.

If there is an architecture whose machine code can be translated to WASM in a reasonably efficient fashion, and it turns out that OCaml already compiles to this architecture, this could be interesting.

If successful, this could, in the long run, help getting many other languages to compile to WASM as well.

My Collection of Links to Sort Through