titzer/virgil

Why is Virgil so fast?

Closed this issue · 8 comments

I ran a Hello World + Fibonacci benchmark comparing Virgil with Rust and TinyGo (the two most often cited Wasm compilers) — the results seem to good to be true!

Virgil outperforms both Rust and TinyGo by orders of magnitude in terms of both compiler speed and executable file sizes. Yes, the 0.00s compile time is correct — time(1) reports to the nearest 1/100s (when I compiled my first Virgil program it was so fast I thought it hadn't run).

The Numbers

wasm

Compile time (secs) Executable size (B) Execution time (secs)
go 4.13s 428,547 0.62s
rust 0.33s 2,054,632 1.80s
virgil 0.00s 8,802 0.96s

wasm-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 3.88s 191,265 0.62s
rust 0.80s 301,363 0.66s
virgil 0.01s 7,891 1.07s

x86-64-linux

Compile time (secs) Executable size (B) Execution time (secs)
go 1.94s 503,640 0.39s
rust 0.30s 3,853,504 1.53s
virgil 0.01s 20,552 0.63s

x86-64-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 2.04s 140,056 0.38s
rust 2.73s 1,653,736 0.32s
virgil 0.01s 19,552 0.64s

WebAssembly Performance

  • The Virgil compiler is ~50x faster than the Rust compiler and over 300x faster than the TinyGo compiler.
  • The optimised Virgil executable is over 35x smaller than the Rust executable and over 20x smaller than TinyGo executable.
  • The TinyGo executable runs ~7% faster than the Rust executable and ~83% faster than the Virgil executable (executed on the wasmtime runtime).

x86-64 Performance

  • The Virgil compiler is ~30x faster than the Rust compiler and ~200x faster than the TinyGo compiler.
  • The optimised Virgil executable is ~80x smaller than the Rust executable and ~7x smaller than TinyGo executable.

Notes

  1. Virgil Wasm code generated with the compiler-opt=all option ran slower than without it but the executable size was ~10% smaller, so currently there's not a lot to be gained using the -opt=all option.

  2. Importing the fmt package increased the size of the TinyGo Wasm executable from 8KB to 191KB (an increase of 183KB), whereas importing the Virgil Strings component increased the size of the Virgil Wasm executable from 3.6KB to 7.9KB (an increase of only 4.3KB).

  3. The compiled Wasm files were executed with wasmtime-cli 0.39.1

Details

The raw data along with source code and platform information is attached.
go-results.txt
rust-results.txt
virgil-results.txt

Nice!

Virgil outperforms both Rust and TinyGo by orders of magnitude in terms of both compiler speed and executable file sizes. Yes, the 0.00s compile time is correct — time(1) reports to the nearest 1/100s (when I compiled my first Virgil program it was so fast I thought it hadn't run).

Indeed. This goes right along with what we were discussing in the #79 ; v3c (Aeneas) compiles only the source code handed to it; there aren't mountains of source code it hunts through or enormous runtime code binaries that need to be linked in. In fact, the runtime is part of the command-line execution of v3c--the entire compilation step from source to binary is contained in one invocation of the compiler.

Also, the Virgil compiler parses, typechecks, and runs initializers for all code, but it only compiles reachable code from main(). It doesn't go past ASTs for anything not reachable from main or needed to run initializers. The reachability phase feeds into polymorphic specialization, so generic code doesn't get specialized in ways that it isn't used. Compilation is generally fast because most optimizations don't even iterate on the SSA. It's fairly lightweight local optimizations on SSA for now.

I'd be interested to see what results you get for x86-linux. Despite that backend being a bit older, it has a register allocator that does better in most situations (but much, much worse in others), so I'd expect the 32-bit code in this example to run even faster.

it only compiles reachable code from main(). It doesn't go past ASTs for anything not reachable from main or needed to run initializers.

Very clever.

I'd be interested to see what results you get for x86-linux.

Here you go (I've also included the Virgil JVM results):

  • x86 and x86-64 executables have roughly the same execution times but the x86 executable is ~10% smaller.
  • JVM execution times are also roughly the same as the x86 and x86-64 and, in terms of size, about ~10% smaller than x86 executables.

The Hello World + Fibonacci app doesn't really exercise the language and is probably not representative of "real world" code, if you have a better benchmark I could run it.

wasm

Compile time (secs) Executable size (B) Execution time (secs)
go 4.16s 428,547 0.61s
rust 0.33s 2,054,632 1.91s
virgil 0.01s 8,802 0.98s

wasm-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 3.80s 191,265 0.62s
rust 0.77s 301,363 0.67s
virgil 0.01s 7,891 1.14s

x86-64-linux

Compile time (secs) Executable size (B) Execution time (secs)
go 1.91s 503,640 0.39s
rust 0.31s 3,853,504 1.55s
virgil 0.01s 20,552 0.63s

x86-64-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 1.94s 140,056 0.41s
rust 2.73s 1,653,736 0.33s
virgil 0.01s 19,552 0.64s

x86-linux

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.01s 18,568 0.58s

x86-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.01s 17,884 0.62s

jvm

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.00s 17,715 0.75s

jvm-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.00s 15,543 0.60s

The raw data along with source code, compiler commands and platform information is attached:

virgil-results.txt
rust-results.txt
go-results.txt

That reminds me, I've been meaning to make memory profiling work with the native GC but haven't gotten around to it yet.

I know that a typical bootstrap of Aeneas does not cause a single GC. As Aeneas has a 512MB heap running on x86-linux, that means it allocates less than 256MB of memory total for a self-compile.

@diakopter I added compilation and execution memory consumption columns to the results:

  • Wasm (wasmtime) runtime memory consumption is ~11 MB for all three compilers.
  • Virgil JVM runtime memory consumption is ~38 MB.
  • Virgil and TinyGo x86-64 runtime memory consumption neck-and-neck at ~290 KB, Rust consumed ~6x more memory.

For native compilation targets Virgil wins in terms of minimising hardware requirements (executable memory and storage).

The bash commands that generated the results have been added to the attached raw results files.

wasm

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 3.74s 428,547 0.62s 173740 KB 11888 KB
rust 0.33s 2,054,632 1.86s 117964 KB 11540 KB
virgil 0.01s 8,802 1.00s 5776 KB 11060 KB

wasm-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 3.91s 191,265 0.62s 181480 KB 11596 KB
rust 0.72s 301,363 0.66s 154196 KB 10740 KB
virgil 0.01s 7,891 1.10s 5744 KB 11144 KB

x86-64-linux

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 1.95s 503,640 0.38s 164452 KB 296 KB
rust 0.32s 3,853,504 1.55s 126424 KB 2028 KB
virgil 0.01s 20,552 0.62s 6456 KB 292 KB

x86-64-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 1.89s 140,056 0.38s 177796 KB 292 KB
rust 2.74s 1,653,736 0.32s 222596 KB 1844 KB
virgil 0.01s 19,552 0.67s 6436 KB 292 KB

x86-linux

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.01s 18,568 0.63s 6412 KB 292 KB

x86-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.01s 17,884 0.64s 6428 KB 292 KB

jvm

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.00s 17,715 0.68s 3960 KB 37516 KB

jvm-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.00s 15,543 0.60s 3940 KB 37540 KB

virgil-results.txt
rust-results.txt
go-results.txt

I know that a typical bootstrap of Aeneas does not cause a single GC. As Aeneas has a 512MB heap running on x86-linux, that means it allocates less than 256MB of memory total for a self-compile.

Virgil sure is parsimonious.

Virgil sure is parsimonious.

I might like garbage collection but I don't like garbage :-)