Why is Virgil so fast?
Closed this issue · 8 comments
I ran a Hello World + Fibonacci benchmark comparing Virgil with Rust and TinyGo (the two most often cited Wasm compilers) — the results seem to good to be true!
Virgil outperforms both Rust and TinyGo by orders of magnitude in terms of both compiler speed and executable file sizes. Yes, the 0.00s compile time is correct — time(1)
reports to the nearest 1/100s (when I compiled my first Virgil program it was so fast I thought it hadn't run).
The Numbers
wasm
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 4.13s | 428,547 | 0.62s |
rust | 0.33s | 2,054,632 | 1.80s |
virgil | 0.00s | 8,802 | 0.96s |
wasm-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 3.88s | 191,265 | 0.62s |
rust | 0.80s | 301,363 | 0.66s |
virgil | 0.01s | 7,891 | 1.07s |
x86-64-linux
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 1.94s | 503,640 | 0.39s |
rust | 0.30s | 3,853,504 | 1.53s |
virgil | 0.01s | 20,552 | 0.63s |
x86-64-linux-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 2.04s | 140,056 | 0.38s |
rust | 2.73s | 1,653,736 | 0.32s |
virgil | 0.01s | 19,552 | 0.64s |
WebAssembly Performance
- The Virgil compiler is ~50x faster than the Rust compiler and over 300x faster than the TinyGo compiler.
- The optimised Virgil executable is over 35x smaller than the Rust executable and over 20x smaller than TinyGo executable.
- The TinyGo executable runs ~7% faster than the Rust executable and ~83% faster than the Virgil executable (executed on the wasmtime runtime).
x86-64 Performance
- The Virgil compiler is ~30x faster than the Rust compiler and ~200x faster than the TinyGo compiler.
- The optimised Virgil executable is ~80x smaller than the Rust executable and ~7x smaller than TinyGo executable.
Notes
-
Virgil Wasm code generated with the compiler
-opt=all
option ran slower than without it but the executable size was ~10% smaller, so currently there's not a lot to be gained using the-opt=all
option. -
Importing the
fmt
package increased the size of the TinyGo Wasm executable from 8KB to 191KB (an increase of 183KB), whereas importing the VirgilStrings
component increased the size of the Virgil Wasm executable from 3.6KB to 7.9KB (an increase of only 4.3KB). -
The compiled Wasm files were executed with
wasmtime-cli 0.39.1
Details
The raw data along with source code and platform information is attached.
go-results.txt
rust-results.txt
virgil-results.txt
Nice!
Virgil outperforms both Rust and TinyGo by orders of magnitude in terms of both compiler speed and executable file sizes. Yes, the 0.00s compile time is correct — time(1) reports to the nearest 1/100s (when I compiled my first Virgil program it was so fast I thought it hadn't run).
Indeed. This goes right along with what we were discussing in the #79 ; v3c
(Aeneas) compiles only the source code handed to it; there aren't mountains of source code it hunts through or enormous runtime code binaries that need to be linked in. In fact, the runtime is part of the command-line execution of v3c
--the entire compilation step from source to binary is contained in one invocation of the compiler.
Also, the Virgil compiler parses, typechecks, and runs initializers for all code, but it only compiles reachable code from main()
. It doesn't go past ASTs for anything not reachable from main or needed to run initializers. The reachability phase feeds into polymorphic specialization, so generic code doesn't get specialized in ways that it isn't used. Compilation is generally fast because most optimizations don't even iterate on the SSA. It's fairly lightweight local optimizations on SSA for now.
I'd be interested to see what results you get for x86-linux
. Despite that backend being a bit older, it has a register allocator that does better in most situations (but much, much worse in others), so I'd expect the 32-bit code in this example to run even faster.
it only compiles reachable code from
main()
. It doesn't go past ASTs for anything not reachable from main or needed to run initializers.
Very clever.
I'd be interested to see what results you get for x86-linux.
Here you go (I've also included the Virgil JVM results):
- x86 and x86-64 executables have roughly the same execution times but the x86 executable is ~10% smaller.
- JVM execution times are also roughly the same as the x86 and x86-64 and, in terms of size, about ~10% smaller than x86 executables.
The Hello World + Fibonacci app doesn't really exercise the language and is probably not representative of "real world" code, if you have a better benchmark I could run it.
wasm
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 4.16s | 428,547 | 0.61s |
rust | 0.33s | 2,054,632 | 1.91s |
virgil | 0.01s | 8,802 | 0.98s |
wasm-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 3.80s | 191,265 | 0.62s |
rust | 0.77s | 301,363 | 0.67s |
virgil | 0.01s | 7,891 | 1.14s |
x86-64-linux
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 1.91s | 503,640 | 0.39s |
rust | 0.31s | 3,853,504 | 1.55s |
virgil | 0.01s | 20,552 | 0.63s |
x86-64-linux-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | 1.94s | 140,056 | 0.41s |
rust | 2.73s | 1,653,736 | 0.33s |
virgil | 0.01s | 19,552 | 0.64s |
x86-linux
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | |||
rust | |||
virgil | 0.01s | 18,568 | 0.58s |
x86-linux-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | |||
rust | |||
virgil | 0.01s | 17,884 | 0.62s |
jvm
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | |||
rust | |||
virgil | 0.00s | 17,715 | 0.75s |
jvm-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | |
---|---|---|---|
go | |||
rust | |||
virgil | 0.00s | 15,543 | 0.60s |
The raw data along with source code, compiler commands and platform information is attached:
That reminds me, I've been meaning to make memory profiling work with the native GC but haven't gotten around to it yet.
I know that a typical bootstrap of Aeneas does not cause a single GC. As Aeneas has a 512MB heap running on x86-linux, that means it allocates less than 256MB of memory total for a self-compile.
@diakopter I added compilation and execution memory consumption columns to the results:
- Wasm (
wasmtime
) runtime memory consumption is ~11 MB for all three compilers. - Virgil JVM runtime memory consumption is ~38 MB.
- Virgil and TinyGo x86-64 runtime memory consumption neck-and-neck at ~290 KB, Rust consumed ~6x more memory.
For native compilation targets Virgil wins in terms of minimising hardware requirements (executable memory and storage).
The bash commands that generated the results have been added to the attached raw results files.
wasm
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | 3.74s | 428,547 | 0.62s | 173740 KB | 11888 KB |
rust | 0.33s | 2,054,632 | 1.86s | 117964 KB | 11540 KB |
virgil | 0.01s | 8,802 | 1.00s | 5776 KB | 11060 KB |
wasm-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | 3.91s | 191,265 | 0.62s | 181480 KB | 11596 KB |
rust | 0.72s | 301,363 | 0.66s | 154196 KB | 10740 KB |
virgil | 0.01s | 7,891 | 1.10s | 5744 KB | 11144 KB |
x86-64-linux
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | 1.95s | 503,640 | 0.38s | 164452 KB | 296 KB |
rust | 0.32s | 3,853,504 | 1.55s | 126424 KB | 2028 KB |
virgil | 0.01s | 20,552 | 0.62s | 6456 KB | 292 KB |
x86-64-linux-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | 1.89s | 140,056 | 0.38s | 177796 KB | 292 KB |
rust | 2.74s | 1,653,736 | 0.32s | 222596 KB | 1844 KB |
virgil | 0.01s | 19,552 | 0.67s | 6436 KB | 292 KB |
x86-linux
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | |||||
rust | |||||
virgil | 0.01s | 18,568 | 0.63s | 6412 KB | 292 KB |
x86-linux-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | |||||
rust | |||||
virgil | 0.01s | 17,884 | 0.64s | 6428 KB | 292 KB |
jvm
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | |||||
rust | |||||
virgil | 0.00s | 17,715 | 0.68s | 3960 KB | 37516 KB |
jvm-optimised
Compile time (secs) | Executable size (B) | Execution time (secs) | Compilation memory | Execution memory | |
---|---|---|---|---|---|
go | |||||
rust | |||||
virgil | 0.00s | 15,543 | 0.60s | 3940 KB | 37540 KB |
I know that a typical bootstrap of Aeneas does not cause a single GC. As Aeneas has a 512MB heap running on x86-linux, that means it allocates less than 256MB of memory total for a self-compile.
Virgil sure is parsimonious.
Virgil sure is parsimonious.
I might like garbage collection but I don't like garbage :-)