WebAssembly/wasi-libc

Proposal: add `wasm32-wasi-preview2` target, prioritizing `wasi-sockets` support

dicej opened this issue ยท 28 comments

dicej commented

Summary

This is a proposal to add wasi-sockets support to wasi-libc as a first step towards full WASI Preview 2 support. This includes adding a new wasm32-wasi-preview2 build target to differentiate it from the existing wasm32-wasi{-threads} targets based on WASI Preview 1.

Background

WASI Preview 2 has been under development for a few years and should be finalized by the end of 2023 or early 2024. Unlike Preview 1, which was defined using WITX and core WebAssembly specification, Preview 2 is based on WIT and the Component Model proposal. In addition, Preview 2 uses unforgeable resource handles instead of file descriptors to track files, sockets, and other host resources.

In order to ease the transition from Preview 1 to Preview 2, the Wasmtime team has created an adapter which may be used with wit-component to convert modules targeting Preview 1 into components targeting Preview 2. Therefore, any toolchain targeting Preview 1 can be used in combination with the adapter and wit-component to generate Preview 2 components. Moreover, developers can use tools like wit-bindgen to access Preview 2 features beyond the scope of Preview 1, including full TCP and UDP socket support via wasi-sockets and high-level HTTP support via wasi-http, with additional interfaces in the works (e.g. wasi-cloud-core).

However, using wasi-sockets via direct host calls is akin to doing networking on native platforms via syscalls, bypassing libc and/or the standard library of the language being used. That makes reusing third-party, network-aware libraries such as database drivers difficult or impossible since they are normally designed to use the relevant standard library. So while it is technically possible to use wasi-sockets in applications today, it will not be ergonomic or practical until the standard libraries of various programming languages have been ported to use it, starting with wasi-libc.

On the other hand, standard libraries with existing Preview 1 support for features such as filesystem access, env variables, clocks, etc. will continue to work in a Preview 2 environment via the adapter, so there's less urgency to update those parts of wasi-libc to use Preview 2 directly.

Proposal

Given the situation described above, we're proposing to create a new wasm32-wasi-preview2 build target for wasi-libc and wasi-sdk which will initially use the Preview 1 host APIs (as implemented by the Preview 1->2 adapter) for everything except sockets, which will bypass the adapter and use the Preview 2 host APIs directly. From there, we'll incrementally replace the Preview 1 parts with their Preview 2 equivalents until the adapter is no longer needed at all.

During the transition period, wasi-libc and the adapter will share responsibility for mapping Preview 1 file descriptors to Preview 2 resource handles, with the former handling sockets and the latter handling files and stdio. In order to avoid confusion (e.g. both wasi-libc and the adapter using the same descriptor to mean different things), we'll add a new adapter_open_badfd function to the adapter, which wasi-libc will use to reserve descriptors for its use, indicating that the adapter should return EBADF if it receives any Preview 1 calls for such descriptors besides fd_close.

Testing

Currently, wasi-libc relies on a subset of a fork of musl's libc-test suite for testing. We plan to expand that subset to include all relevant socket tests. In addition, we'll be updating the Rust and Python standard libraries to match progress made in wasi-libc, enabling the socket tests in their test suites as well.

Prototype

I've created experimental forks of the wasi-libc, the adapter, and Rust, along with a test harness that demonstrates the use of wasi-sockets via the Rust standard library.

sbc100 commented

Sounds like a good plan, although I'm not sure about the use of the new target triples. In the past we have talked about trying not to let all our build variants result in proliferation of target triples.

Presumably the preview2 stuff all lives deep inside of libc itself, so won't effect user-level object files. e.g. one won't need to rebuild everything from source just to link against this new flavor of libc?

dicej commented

Sounds like a good plan, although I'm not sure about the use of the new target triples. In the past we have talked about trying not to let all our build variants result in proliferation of target triples.

Yeah, @alexcrichton and I were brainstorming ways to incrementally add Preview 2 support (specifically wasi-sockets) without creating a new triple, e.g. relying on dead code elimination and/or weak symbols to ensure that people still get Preview 1-compatible modules as long as they don't use anything that requires Preview 2 (e.g. making outbound TCP connections). That approach would essentially mean anyone who wants to target Preview 2 must continue using the adapter indefinitely, which is awkward but not out of the question.

Presumably the preview2 stuff all lives deep inside of libc itself, so won't effect user-level object files. e.g. one won't need to rebuild everything from source just to link against this new flavor of libc?

Correct -- they'll only need to rebuild if they want to use the new Preview 2-only features.

sbc100 commented

Correct -- they'll only need to rebuild if they want to use the new Preview 2-only features.

Wouldn't they even need to rebuild the object files though?

I'm assuming the public headers don't need to change between preview1 and preview2, only the implementation within libc itself. So the same object code and link with this new flavor of libc and work just fine.. they would just get a different implementation of e.g. the socket library function.

e.g.: Wouldn't something like this work:

$ clang -target wasm32-wasi mycode.c -c
$ clang -target wasm32-wasi mycode.o -o uses_preview1.wasm -L/path/to/preview1-libc/ -lc
$ clang -target wasm32-wasi mycode.o -o uses_preview2.wasm -L/path/to/preview2-libc/ -lc

I do think it's theoretically possible to support a single sysroot maybe and something like -lc-preview1 and -lc-preview2 or similar, but I also think this is an important enough change in functionality that it's best to give it a succinct name and configuration strategy. For example the Rust toolchain is going with two targets, and I think the reasons there that it's the route chosen are equally applicable for wasi-libc too.

sbc100 commented

You could also have two different sysroots without changing the triple:

$ clang --sysroot=/path/to/preview1 --target wasm32-wasi
$ clang --sysroot=/path/to/preview2 --target wasm32-wasi

I seem to remember from previous discussions that we didn't want to create a lot of new target triples on the llvm / wasi-libc side?

It seems like rust like to use target triple all this stuff but I want to avoid taking that route just because of the rust inertia. If it makes sense not to have so many triples out there then perhaps we should instead be pushing back against rust. Maybe this is a case were a new triple makes sense? I'm not totally sure. For threads I think we decided it did. For other features it might not make sense.

Are you proposing a new triple for each iteration? i.e. preview3 and preview4 would also get triples?

I personally agree that in theory one triple or not many triples would be best, but I don't think that it's actually possible to achieve that. Following Rust here to me isn't just inertia, it's well-motivated independently.

Each triple requires an entire rebuild of libc, or a separate sysroot, at minimum. This means that to wasi-libc it's a bunch of #ifdef and #define no matter what. At that point it's a question of what to actually call things. I personally believe that a target provides a clear name by which to identify and communicate what's going on here. Benefits include:

  • Minimizing flags for example so you only need to pass --target wasm32-wasi-preview2.
  • Detection when part of a project uses one target and part uses another.
  • Easier to communicate to end-users, downstream projects, downstream toolchains, etc. Many languages depend on wasi-libc, and I don't think it's feasible to have all of them support one WASI target with various flags to select the right libc.
  • Provides a clear path to phasing out wasm32-wasi, as-is today, as a target. WASI preview1 is at a "dead end" and won't receive any further development, so it needs to be removed at some point in the future.

Adding a target is a big thing in my opinion and isn't something that should be taken lightly, but the transition to the component model is a big deal and additionally isn't something that can be taken lightly. For example the Rust compiler I'm proposing it use a wrapper around wasm-ld, probably called wasm-component-ld (doesn't exist yet). The native output of the target will be a component, not a core wasm module. I think that wasi-{sdk,libc} should follow suit eventually too once we figure everything out.

Are you proposing a new triple for each iteration? i.e. preview3 and preview4 would also get triples?

Yes. This is expected to be on the timescale of years, however, and the old targets will get phased out. There is no timeline at this point for preview3, and it's in the future so we don't even know exactly what it might look like. Part of the possible development of preview3 and breaking changes would take into account that it's a bummer to add targets to all the languages. That may mean this never ends up happening, we basically need to ship preview2 to figure out the answer.

sbc100 commented

Following Rust here to me isn't just inertia, it's well-motivated independently.

Thanks for all the clarifying data, my intent here is was only to make sure we were considering this stuff carefully and it sounds like you have.

yamt commented

those targets won't have any actual ABI differences when you just do cc -c, right?
if so, i feel it's simpler and better to switch only the libc binary (or maybe sysroot) than having multiple targets.

I propose we don't try to guarantee full ABI compatibility between the two targets. I'm aware that full ABI compatibility would be nice to have, but ABIs are tricky to preserve when doing cross-cutting changes like preview1->preview2, so we were to guarantee that, we'd really want much more comprehensive ABI compatibility testing, to protect users from silent accidental ABI breakage.

Also, the preview1 ABI contains types and constants from cloudabi and preview1 which would benefit from being updated for preview2.

I would also say that in addition to what @sunfishcode already said the preview2 target is already planned to have incompatible ABIs. Less so at the actual "this type is this size" level but an example is a file descriptor. In preview1 a "file descriptor" is handed off to wasi_snapshot_preview1 functions "raw", but in preview2 a file descriptor will be a runtime abstraction within libc itself becaues the component model works differently. This means that preexisting code interoperating with libc file descriptors and wasi_snapshot_preview1 manually would be broken on the preview2 target.

Overall there is so much different about preview2 that, as I mentioned above, historical discussions about this have always pointed in the direction of a new target. There's not disagreement that a second sysroot or a second binary copy of libc could work, but my post above explains why there are issues beyond that isolated technical issue which are better solved with a second target.

sbc100 commented

This means that preexisting code interoperating with libc file descriptors and wasi_snapshot_preview1 manually would be broken on the preview2 target.

I agree there may be ABI differences but with wasi-libc aren't they mostly hidden by the libc abstraction? libc-using code that uses things like int fd and FILE* f right should continue to work without modification right?

(Also, note that we don't currently prevent linking of object files built with different targets since the target is not currently encoded in the object file. We have talking about adding this feature though).

If a project exclusively used wasi-libc and nothing else, then yes it could work with either a preview1 or preview2 sysroot without modification. Having worked on the WASI targets in Rust which use wasi-libc, that is not the case there for sure. My experience is that in general it's pretty rare for a non-trivial program to use only libc and literally nothing else. The interop is what I'm chiefly worried about and where it's extremely useful to have a high-level distinction to talk to end developers about what to target and how to shape their own build processes/etc.

ydnar commented

I'm not sure if this is useful feedback, as it applies more to Rust than wasi-libc, but here goes:

TinyGo uses wasi-libc to target WASI Preview 1, but mainline Go targets the WASI Preview 1 APIs directly.

Given the abstraction level of the Component Model vs the C ABI style of Preview 1, would it make sense for Rust to implement guest bindings for Preview 2 / Component Model directly, rather than via wasi-libc?

(Related: work is in progress to target Preview 2 directly in TinyGo instead of depending on wasi-libc and the P1->P2 adapter.)

Edit: I also wonder if a native Rust implementation of a Preview 2 guest would be help to inform a C implementation here.

That's a good question, and is something that's been considered when talking to folks historically, but the conclusion has generally been that it's desirable to be able to link Rust and C (and maybe C++) into the same WebAssembly binary. In that situation it's best to have wasi-libc as a base set of abstractions. Otherwise it would be impossible to communicate sockets between Rust and C, for example, since C would think a socket is a file descriptor when in Rust it'd be a collection of WASI preview2 resources.

ydnar commented

How often does that happen in practice?

For the cases in which it does might it be preferable to provide a C wrapper around a memory-safe implementation if interop is required?

Would this just apply to wasi:cli or to higher level components like wasi:http as well?

sbc100 commented

We are getting off topic here but I can think of a couple more arguments for a given language runtime to use wasi-libc rather than directly binding to wasi syscall layer:

  1. It risks re-implementing a lot of wasi-libc / libc in each language runtime. I see this as somewhat analogous to directly using the linux syscall layer and bypassing glibc. You can do it, but there is rarely a good reason too.

  2. Many language runtimes (such as rust I imagine) already a have code that uses libc-like APIs. If you don't use wasi-libc you can not longer share that existing code (unless you essentially re-implement things like open/close etc).

yamt commented

I propose we don't try to guarantee full ABI compatibility between the two targets. I'm aware that full ABI compatibility would be nice to have, but ABIs are tricky to preserve when doing cross-cutting changes like preview1->preview2, so we were to guarantee that, we'd really want much more comprehensive ABI compatibility testing, to protect users from silent accidental ABI breakage.

while comprehensive testing is great to have, the lack of it is not a good enough reason to give up compatibility, IMO.

Also, the preview1 ABI contains types and constants from cloudabi and preview1 which would benefit from being updated for preview2.

for example?

yamt commented

In preview1 a "file descriptor" is handed off to wasi_snapshot_preview1 functions "raw", but in preview2 a file descriptor will be a runtime abstraction within libc itself becaues the component model works differently. This means that preexisting code interoperating with libc file descriptors and wasi_snapshot_preview1 manually would be broken on the preview2 target.

it sounds like something which most of the user applications never do.
is it what rust does?

yamt commented

except sockets, which will bypass the adapter and use the Preview 2 host APIs directly

does this mean to have an equivalent of the wit-bindgen c output of https://github.com/WebAssembly/wasi-sockets in libc as a syscall stub?

yamt commented

During the transition period, wasi-libc and the adapter will share responsibility for mapping Preview 1 file descriptors to Preview 2 resource handles, with the former handling sockets and the latter handling files and stdio. In order to avoid confusion (e.g. both wasi-libc and the adapter using the same descriptor to mean different things), we'll add a new adapter_open_badfd function to the adapter, which wasi-libc will use to reserve descriptors for its use, indicating that the adapter should return EBADF if it receives any Preview 1 calls for such descriptors besides fd_close.

  • what adapter_open_badfd does is, to open a preview1 descriptor, on which you can only do fd_close, right?
  • how do you plan to dispatch common operations (eg read) to preview1/preview2?

Zig (not zig cc) doesn't use wasi-libc and calls WASI hostcalls directly like Go. This provides better safety, better integration (especially for things such as threads), faster compile times, and makes things more consistent across platforms and easier to debug.

The builtin wasi-libc can still be compiled and linked, which is very useful when mixing C and Zig code in the same application, or to embed code originally written in Zig in Rust crates when targeting WebAssembly (as done for example in the sealed_box and aes-wasm crates).

In that context, preserving ABI compatibility would be immensely useful. And it is not too late to do it.

That would also make it way easier to pass descriptors between Zig code and code using wasi-libc.

@jedisct1 Please understand that ABI comptibility would be a lot of work. And it's work that neither you nor anyone else appears to be volunteering to help with.

dicej commented

In preview1 a "file descriptor" is handed off to wasi_snapshot_preview1 functions "raw", but in preview2 a file descriptor will be a runtime abstraction within libc itself becaues the component model works differently. This means that preexisting code interoperating with libc file descriptors and wasi_snapshot_preview1 manually would be broken on the preview2 target.

it sounds like something which most of the user applications never do. is it what rust does?

Here's a list of Rust crates which currently use WASI Preview 1 host functions directly, ordered by popularity: https://crates.io/crates/wasi/reverse_dependencies. Over time, we'll want them to migrate to Preview 2, but that will take a while, and some might choose to do so incrementally.

dicej commented

except sockets, which will bypass the adapter and use the Preview 2 host APIs directly

does this mean to have an equivalent of the wit-bindgen c output of https://github.com/WebAssembly/wasi-sockets in libc as a syscall stub?

Yes, exactly: https://github.com/dicej/wasi-libc/blob/sockets/libc-bottom-half/headers/private/reactor.h and https://github.com/dicej/wasi-libc/blob/sockets/libc-bottom-half/cloudlibc/src/libc/sys/wasi_preview2/reactor.c.

dicej commented

During the transition period, wasi-libc and the adapter will share responsibility for mapping Preview 1 file descriptors to Preview 2 resource handles, with the former handling sockets and the latter handling files and stdio. In order to avoid confusion (e.g. both wasi-libc and the adapter using the same descriptor to mean different things), we'll add a new adapter_open_badfd function to the adapter, which wasi-libc will use to reserve descriptors for its use, indicating that the adapter should return EBADF if it receives any Preview 1 calls for such descriptors besides fd_close.

* what `adapter_open_badfd` does is, to open a preview1 descriptor, on which you can only do `fd_close`, right?

Yes, although now I'm thinking I might add an adapter_close_badfd so that even fd_close will return EBADF, since I don't want anyone to call fd_close directly without going through wasi-libc, which is responsible for removing the appropriate entry from the descriptor->handle table.

* how do you plan to dispatch common operations (eg `read`) to preview1/preview2?

First, check to see if there's an entry in the descriptor->handle table. If so, pass the handle(s) to preview2 call(s). If not, pass the descriptor to preview1 call(s).

yamt commented

Preview 2 is based on WIT and the Component Model proposal.

i have been told preview2 does not require component-model.
eg. WebAssembly/WASI#503

at least, this adapter-based approach doesn't seem compatible with the direct use of preview2 from core wasm module.
am i missing something?

Preview 2 is based on WIT and the Component Model proposal.

i have been told preview2 does not require component-model. eg. WebAssembly/WASI#503

at least, this adapter-based approach doesn't seem compatible with the direct use of preview2 from core wasm module. am i missing something?

Almost all of the existing tooling to support Preview 2 involves the Component Model to some degree, but it's also true that it can be supported at the core module level as well. In the latter case, a host must still implement a subset of the Component Model, e.g. the canonical ABI for converting between high-level component types and low-level core Wasm types, analogous to the WITX ABI used in Preview 1. Ideally, a host that supports Preview 2 at the module level would also use the component type custom section (if available) to verify that the guest component types match what the host expects (i.e. not just that the core Wasm types match).

Regarding the adapter: @cpetig has been releasing static library builds which should allow it to be linked into a module as an alternative to using wit-component: https://github.com/cpetig/wasmtime-adapter. Alternatively, it should be possible to add an option to wit-component to output a module instead of a component after linking in the adapter if the static library approach doesn't work for some reason.

Quick update on this: all the wasi-sockets support has been merged and released as part of wasi-sdk 22. The build has also been split into separate wasm32-wasip1, wasm32-wasip1-threads, and wasm32-wasip2 targets.

To my knowledge, no work has yet been done to switch from p1 to p2 APIs outside of wasi-sockets, which is what would be required to make the wasm32-wasip2 target generally usable without the wasi_snapshot_preview1 adapter mentioned above. Meanwhile, the adapter seems to work just fine for most purposes, so I'm not personally planning to work on the p1->p2 API migration, but if someone wants to dive in, that would be great!