WebAssembly/wasi-filesystem

Clarification: Is file offset specific to fd or an open file handle?

yagehu opened this issue · 5 comments

In Linux, each open file has an offset and multiple file descriptors can point to the same open file handle in the system-wide open file table. So it can look like this:

File descriptor table         System-wide open file table
+-----+                       +--------+-----------+
|   # |                       | offset | inode ptr |
+=====+                       +========+===========+
|   0 |------+--------------> |      0 |       ... |
+-----+      |                +--------+-----------+
| ... |      |
+-----+      |
|  42 |------+
+-----+

This means file offsets are not associated with FDs but with the open file descriptions.

My question is: Is this the case in WASI as well? Does each FD in WASI have its own file offset? As far as I can tell, WASI does not have the concept of open file descriptions.

The WASI comment for fd_tell says:

  ;;; Return the current offset of a file descriptor.
  ;;; Note: This is similar to `lseek(fd, 0, SEEK_CUR)` in POSIX.

which to me implies offset is specific to each FD.

The Linux man page for lseek instead says:

lseek() repositions the file offset of the open file description associated with the file descriptor

which makes it clear that offset is not associated with a file descriptor but with the open file description/handle.

Indeed, WASI basically doesn't have a concept of open file descriptions. This is actually one of the reasons why WASI doesn't have dup or any other way to request multiple file descriptors for the same file description. The main use case for dup involves fork which we're not supporting anyway, so we chose to keep our options open.

So are you saying:

  1. Offset is per fd, or
  2. It is unspecified.

It's conceivable that the WASI execution environment (say Wasmtime) dups the FD and pass it to the WASI program. In that case, we have 2 fds that share the same offset (assuming the Wasm runtime is running on something like Linux). Is this possible?

If we consider arbitrary environment behavior, there's very little we can say. It is indeed possible for someone using eg. Wasmtime's API to create a WASI execution environment containing two file descriptors which share an open file description, in which case a WASI application would indeed be able to do a seek on one and observe the effect on the other. However, it's also possible for someone to create a thread and have it do random seeks on a file descriptor that a WASI environment also has, in which case WASI would see the file descriptor's offset changing randomly, with no apparent cause. There'd be no way to reliably tell the difference.

That is to say, I believe the difference between 1 and 2 is not observable right now.

The main problem here is that the interaction between host usage of POSIX APIs in the same process as WASI guest usage of WASI APIs is not currently well specified. Are you allowed to give WASI file duped descriptors? Are you allowed to give WASI file descriptors that you haven't duped but have made copies of (so you can still lseek and other things on them behind WASI's back)? I think we want to say no to both, but we don't yet have a framework for saying such things.

OK. Thanks for the clarification. Closing.