WebAssembly/wasi-filesystem

Remove paths

SoniEx2 opened this issue · 10 comments

Since we have openat, there's no reason to have paths. Paths are known to introduce major vulnerabilities:

  1. Accidentally allowing subdirectory access when treating arbitrary strings as a file name.
  2. Accidentally allowing parent directory access due to improper handling of .. elements in paths.

Further, paths can be emulated in userspace if really needed, including the semantics of . and ... There is no good reason to have paths in a modern, low-level filesystem API, and they should be considered a legacy technology instead.

It's an interesting idea, since the generic way to sandbox paths is to open them open path component at a time, and we could view this as simply moving that code out of the host and into wasm. And, there's probably a lot of variation in how this is currently implemented in engines.

However, paths allow implementations to use optimized implementations such as Linux's openat2 with the RESOLVE_BENEATH flag rather than opening one path component at a time, allowing it to open complex paths with a single system call instead of a system call per path component including symlink expansions.

It would also bring support for filesystems that don't use a "path DSL" but instead treat paths as true arrays of edges.

If chained opening of one path component at a time is too popular, we can also add an API that allows chained opening of one path component at a time. We'd still discourage the engine from building up a chained opening script (path DSL) from the path components, since that would bring back all of the issues avoided by moving away from scripting your openats, but it could do that to avoid the extra syscalls.

I don't think supporting exotic filesystems that don't use conventional path strings is enough of a motivation to make the system less optimal or more complex for the popular use cases.

what about System 7 support?

Do you mean Mac OS System 7, the last release of which was 26 years ago? I think it's safe to call that exotic in this context. I'm not opposed to supporting it in principle, but if it would require wasi-filesystem to be significantly more complex for the platforms that the vast majority of users today are using, I don't think that tradeoff is worthwhile.

Path strings are what the vast majority of platforms in use today use, and it's a goal of wasi-filesystem to support existing filesystems. Changing this is theoretically interesting, but with the filesystems in use today, it would add overhead, preclude some optimizations, increase code size, and not bring any advantages for the vast majority of expected use cases.

In principle I like @SoniEx2 suggestion, so I just wanted to add third option for the records; keep paths in the interface, but represent them as a list of path segments, instead of a single / separated string. This would tackle both the OP's concerns while also allowing RESOLVE_BENEATH-like syscall optimizations. In exchange for more heap allocations.

Having said that, I agree with @ sunfishcode's previous comment. So no need to reopen this issue.

how much of a bottleneck is it exactly? as in ppl thought rust's bounds checking was gonna be a bottleneck and rust proved them wrong.

we still believe an API that is harder for implementers to get wrong is better than an API that has giant warnings about how to implement it correctly.

It can be as much as the difference between making one host system call per open and one host system call per path component per open. If you believe the situation might be analogous to array bounds checking in Rust, I encourage you to prototype something and measure the overhead.

An additional consideration is the complexity in guest toolchains of supporting existing code which uses paths.

we don't mind taking the path as an array of path components. if you're gonna be interpolating path components (strings) into a path (for passing onto host API with RESOLVE_BENEATH), hopefully that's enough to make the implementer think about escaping/filtering.

the complexity in guest toolchains is definitely not insignificant tho.

it's kinda unfortunate that the filesystem is so ubiquitous across modern systems that you can't even begin to consider making it better. and we don't even mean replacing it or getting rid of it.