WebAssembly/wasi-parallel

How to pass a function to execute in parallel?

radu-matei opened this issue · 1 comments

In the process of adapting the interface to the new WIT format, I've bumped into a potential issue — the kernel function is supposed to be passed as a function index, or in the future, as a funcref:

;;; The kernel to run in parallel. TODO It is unclear how to represent this as a `funcref`, which
;;; itself may not be the final mechanism for identifying which kernel to run.
(typename $function u32)

If I am not missing anything, this assumes the host implementation will have access to the underlying caller object, so it can access a table or exported function.
However, this is not yet possible with wit-bindgen, but more importantly, something that might not be in scope for the interface types proposal:

I think that that's something we should treat as either blocking wasi-parallel or Interface Types: we want all WASI interfaces to be virtualizable without breaking encapsulation, so while we can work around limitations like this in wit-bindgen, that's not workable more generally, I think (@tschneidereit in bytecodealliance/wit-bindgen#130 (comment))

I am not sure how we could work around this in the current interface without a change in wit-bindgen, but as Till points out in that thread, that might not be in scope.
What do you think?

For context, the current state of this interface, adapted to the new WIT format (WIP here):

// A memory buffer usable in a parallel context.
resource buffer

// The ways a buffer can be accessed.
enum buffer-access-kind {
    read,
    write,
    read-write
}

// The contents of a parallel buffer.
// If WASI adopts a [canonical ABI](https://github.com/WebAssembly/interface-types/pull/132), 
// this type would be replaced by `pull-buffer` and `push-buffer`.
type buffer-data = list<u8>

// The size of a buffer.
type buffer-size = u32

// A device used for parallel calls.
resource parallel-device

// The ways a buffer can be accessed.
enum device-kind {
    cpu,
    discrete-gpu,
    integrated-gpu
}

// The kernel to run in parallel.
// TODO: It is unclear how to represent this as a `funcref`, which
// itself may not be the final mechanism for identifying which kernel to run.
type func = string

// Error codes returned by functions in this API.
enum error {
    success
}

// Retrieve a a system device using a hint.
// The implementation may choose to ignore the hint and return any kind of device.
get-device: function(hint: device-kind) -> expected<parallel-device, error>

// Create a buffer on a device.
create-buffer: function(device: parallel-device, size: buffer-size, kind: buffer-access-kind) -> expected<buffer, error>

// Assign bytes from local memory to the parallel buffer; 
// the implementation may choose to copy or not copy the bytes.
write-buffer: function(data: buffer-data, buffer: buffer) -> expected<_, error>

// Retrieve bytes from a parallel buffer into local memory;
// the implementation may choose to copy or not copy the bytes.
read-buffer: function(buffer: buffer, data: buffer-data) -> expected<_, error>

// Run a function in parallel--a "parallel for" mechanism.
// TODO: perhaps return the output buffers instead of a mutable argument?
parallel-for: function(worker: func, num-threads: u32, block-size: u32, in-buffers: list<buffer>, out-buffers: list<buffer>) -> expected<_, error>

I'm going to rename this issue to something more recognizable to others who are working on this (cc: @mingqiusun, @penzn, @egalli). We have worked on several ideas but the current thinking is that an "outside of wasi-parallel" solution is necessary (e.g., interface types, component model, etc.). I commented in more depth in bytecodealliance/wit-bindgen#130 (comment) and hopefully that can be resolved there (or in a conversation stemming from there). For the time being, until a good higher-level solution is available, type func = string does not seem unreasonable.