/zigcoro

A Zig coroutine library

Primary LanguageZigBSD Zero Clause License0BSD

zigcoro

Async Zig as a library using stackful asymmetric coroutines.

Supports async IO via libxev.


test

Branch main tested against Zig 0.14.0-dev.32+4aa15440c

Coroutines supported on Windows x86_64, Linux {x86_64, aarch64, riscv64}, and Mac {x86_64, aarch64}.

Async IO supported on Linux {x86_64, aarch64}, and Mac {x86_64, aarch64}.

Depend

build.zig.zon

.zigcoro = .{
  .url = "git+https://github.com/rsepassi/zigcoro#<commit hash>",
  .hash = "<hash>",
},

build.zig

const libcoro = b.dependency("zigcoro", .{}).module("libcoro");
my_lib.addModule("libcoro", libcoro);

Current status

Updated 2024/04/23

Alpha.

Async/await, suspend/resume, Channels, and async IO are all functional and (CI) tested.

See future work for more.

Coroutine API

// High-level API
xasync(func, args, stack)->FrameT
xawait(FrameT)->T
xframe()->Frame
xresume(frame)
xsuspend()
xsuspendBlock(func, ptr)

Channel(T, .{.capacity = n})
  init(Executor)
  send(T)
  recv->?T
  close()
Executor
  init()
  runSoon(Func)
  tick()->bool

// Optional thread-local environment
initEnv

// Low-level API
// Note: Frame = *Coro, FrameT = CoroT
Coro
  getStorage(T)->*T
CoroT(func, opts)
  frame()->Frame
  xnextStart(frame)->YieldT
  xnext(frame, inject)->YieldT
  xnextEnd(frame, inject)->ReturnT
  xyield(yield)->InjectT
  xreturned(frame)->ReturnT

// Stack utilities
stackAlloc(allocator, size)->[]u8
remainingStackSize()->usize

Async IO API

libcoro.asyncio provides coroutine-based async IO functionality building upon the event loop from libxev, providing coroutine-friendly wrappers to all the high-level async APIs.

See test_aio.zig for usage examples.

// Executor
Executor
  init(loop)

// Top-level coroutine execution
run

// Optional thread-local environment
initEnv

// IO
sleep
TCP
  accept
  connect
  read
  write
  close
  shutdown
UDP
  read
  write
  close
Process
  wait
File
  read
  pread
  write
  pwrite
  close
AsyncNotification
  wait

Switching to Zig's async/await

It's trivial to switch to Zig's async/await whenever it's ready.

Translation (zigcoro then Zig):

// async
var frame = try xasync(func, args, stack);
var frame = async func(args);

// await
const res = xawait(frame);
const res = await frame;

// @frame
var frame = xframe();
var frame = @frame();

// suspend
xsuspend();
suspend {}

// suspend block
xsuspendBlock(func, args);
suspend { func(args); }

// resume
xresume(frame);
resume frame;

// nosuspend
asyncio.run(loop, func, args, stack)
nosuspend asyncio.run(loop, func, args, null)

// xev IO
// No changes needed to the calls
try asyncio.sleep(loop, 10);

The above assumes the Zig async API that was available in Zig 0.10.1, which I expect (but do not know) to be similar in 0.12.0.

Performance

I've done some simple benchmarking on the cost of context switching and on pushing the number of coroutines. Further investigations on performance would be most welcome, as well as more realistic benchmarks.

Context switching

This benchmark measures the cost of a context switch from one coroutine to another by bouncing back and forth between 2 coroutines millions of times.

From a run on an AMD Ryzen Threadripper PRO 5995WX:

> zig env | grep target
 "target": "x86_64-linux.5.19...5.19-gnu.2.19"

> zig build benchmark -- --context_switch
ns/ctxswitch: 7

From a run on an M1 Mac Mini:

> zig env | grep target
 "target": "aarch64-macos.13.5...13.5-none"

> zig build benchmark -- --context_switch
ns/ctxswitch: 17

Coroutine count

This benchmark spawns a number of coroutines and iterates through them bouncing control back and forth, periodically logging the cost of context switching. As you increase the number of coroutines, you'll notice a cliff in performance or OOM. This will be highly dependent on the amount of free memory on the system.

Note also that zigcoro's default stack size is 4096B, which is the typical size of a single page on many systems.

From a run on an AMD Ryzen Threadripper PRO 5995WX:

> zig env | grep target
 "target": "x86_64-linux.5.19...5.19-gnu.2.19"

> cat /proc/meminfo | head -n3
MemTotal:       527970488 kB
MemFree:        462149848 kB
MemAvailable:   515031792 kB

> zig build benchmark -- --ncoros 1_000_000
Running benchmark ncoros
Running 1000000 coroutines for 1000 rounds
ns/ctxswitch: 57
...

> zig build benchmark -- --ncoros 100_000_000
Running benchmark ncoros
Running 100000000 coroutines for 1000 rounds
ns/ctxswitch: 57
...

> zig build benchmark -- --ncoros 200_000_000
Running benchmark ncoros
Running 200000000 coroutines for 1000 rounds
error: OutOfMemory

From a run on an M1 Mac Mini:

> zig env | grep target
 "target": "aarch64-macos.13.5...13.5-none"

> system_profiler SPHardwareDataType | grep Memory
  Memory: 8 GB

> zig build benchmark -- --ncoros 800_000
Running benchmark ncoros
Running 800000 coroutines for 1000 rounds
ns/ctxswitch: 26
...

> zig build benchmark -- --ncoros 900_000
Running benchmark ncoros
Running 900000 coroutines for 1000 rounds
ns/ctxswitch: 233
...

Stackful asymmetric coroutines

  • Stackful: each coroutine has an explicitly allocated stack and suspends/yields preserve the entire call stack of the coroutine. An ergonomic "stackless" implementation would require language support and that's what we expect to see with Zig's async functionality.
  • Asymmetric: coroutines are nested such that there is a "caller"/"callee" relationship, starting with a root coroutine per thread. The caller coroutine is the parent such that upon completion of the callee (the child coroutine), control will transfer to the caller. Intermediate yields/suspends transfer control to the last resuming coroutine.

The wonderful 2009 paper "Revisiting Coroutines" describes the power of stackful asymmetric coroutines in particular and their various applications, including nonblocking IO.

Future work

Contributions welcome.

  • Documentation, code comments, examples
  • Improve/add allocators for reusable stacks (e.g. Buddy allocator)
  • Concurrent execution helpers (e.g. xawaitAsReady)
  • Add support for cancellation and timeouts
  • More aggressive stack reclamation
  • Libraries
    • TLS, HTTP, WebSocket
    • Actors
    • Recursive data structure iterators
    • Parsers
  • Multi-threading support
  • Alternative async IO loops (e.g. libuv)
  • Debugging
    • Coro names
    • Tracing tools
    • Verbose logging
    • Dependency graphs
    • Detect incomplete coroutines
    • ASAN, TSAN, Valgrind support
  • C API
  • Broader architecture support
    • risc-v
    • 32-bit
    • WASM (Asyncify?)
    • comptime?

Inspirations