WebAssembly/wasi-filesystem

dirent::d_ino

Opened this issue · 23 comments

yamt commented

depends on the underlying platforms and/or filesystems, it can be very expensive to provide dirent::d_ino.
i'd suggest to make it optional.
eg. allow a wasi implementation to say it unknown in a way similar to filetype.unknown.
how do you think?

cf. https://github.com/bytecodealliance/wasm-micro-runtime/blob/b49fb8a66840d484e775936399456fb6b85f98c6/core/iwasm/libraries/libc-wasi/sandboxed-system-primitives/src/posix.c#L2101-L2105

The ability to compare two files' inode numbers is a common way to test whether two file descriptors refer to the same file. If we make d_ino optional, that common idiom wouldn't work.

On what platforms is computing d_ino expensive?

yamt commented

The ability to compare two files' inode numbers is a common way to test whether two file descriptors refer to the same file. If we make d_ino optional, that common idiom wouldn't work.

On what platforms is computing d_ino expensive?

eg. nuttx doesn't have it in struct dirent.
although it has st_ino field in struct stat, the majority of its filesystems doesn't provide it.
https://github.com/apache/incubator-nuttx/blob/81ff16c675296d4d411fe1a2c1eaf74f52fa33cc/include/dirent.h#L103-L115
https://github.com/apache/incubator-nuttx/blob/81ff16c675296d4d411fe1a2c1eaf74f52fa33cc/include/sys/stat.h#L146

eg. littlefs doesn't seem to have handy on-disk numbers appropriate for the purpose.
https://github.com/littlefs-project/littlefs/blob/40dba4a556e0d81dfbe64301a6aa4e18ceca896c/lfs.h#L268-L281

Ok, so then it seems to question is: should we make wasi-filesystem less useful for many applications, in order to make it more useful for people that want to use it on nuttx and littlefs?

yamt commented

my suggestion is merely to define an official way for apps to know the lack of the certain underlying feature. (file number)
right now, some wasi implementations are using d_ino = 0 for the purpose.
for example, we can just make it official. or maybe we can introduce another special number like -1. or maybe we can introduce pathconf-like api or something along the line.
i don't think it makes "wasi-filesystem less useful for many applications".

Comparing inode numbers is a common Unix idiom. We can define a new API or a new sentinel convention, but application code ported to WASI from other platforms wouldn't know to use it.

Nuttx advertises itself as POSIX compatible, but in the code it has this comment.

What should we do?

yamt commented

but application code ported to WASI from other platforms wouldn't know to use it.

sure, but it won't make the situation any worse.

some more data points:

  • uvwasi uses d_ino=0 as the underlying uv fs api doesn't provide it.
  • windows FindNextFileW doesn't seem to provide d_ino equivalent.

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

yamt commented

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

where can i find it?

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

does it mean three calls per entry? ie. get the handle, GetFileInformationByHandle, close the handle.

in any case, i'm sure that the readdir usage without checking d_ino is far more common than the programming idiom.

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

where can i find it?

The repository is at https://github.com/WebAssembly/wasi-testsuite/. As I said, we recently started work on it so there are no tests there yet, but this is the path forward.

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

does it mean three calls per entry? ie. get the handle, GetFileInformationByHandle, close the handle.

in any case, i'm sure that the readdir usage without checking d_ino is far more common than the programming idiom.

There are users that use d_ino, even though I also expect you're right that many more don't.

Here's an idea: What if we said it's ok for readdir to omit an inode value, and we teach wasi-libc's readdir to do an extra stat call to get the inode, if d_ino doesn't have it?

That way, wasi-libc users would continue to get a working d_ino field. But we could then offer special options for users to opt out of the extra stat call if they don't need d_ino.

yamt commented

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

where can i find it?

The repository is at https://github.com/WebAssembly/wasi-testsuite/. As I said, we recently started work on it so there are no tests there yet, but this is the path forward.

thank you for the link.

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

does it mean three calls per entry? ie. get the handle, GetFileInformationByHandle, close the handle.
in any case, i'm sure that the readdir usage without checking d_ino is far more common than the programming idiom.

There are users that use d_ino, even though I also expect you're right that many more don't.

Here's an idea: What if we said it's ok for readdir to omit an inode value, and we teach wasi-libc's readdir to do an extra stat call to get the inode, if d_ino doesn't have it?

That way, wasi-libc users would continue to get a working d_ino field. But we could then offer special options for users to opt out of the extra stat call if they don't need d_ino.

while it would work for platforms which doesn't provide d_ino equivalent,
it merely makes readdir more expensive in case the underlying filesystems w/o suitable on-disk numbers.
i'm afraid that the latter might be more common reason to omit d_ino.

probably readdir can tell libc one of the following:
a. d_ino is here
b. omitted, but you can use stat to get it
c. no d_ino available on this filesystem
but i'm feeling a bit over-engineering now.

I've now posted WebAssembly/wasi-libc#345 to implement the wasi-libc change discussed here.

while it would work for platforms which doesn't provide d_ino equivalent, it merely makes readdir more expensive in case the underlying filesystems w/o suitable on-disk numbers. i'm afraid that the latter might be more common reason to omit d_ino.

probably readdir can tell libc one of the following: a. d_ino is here b. omitted, but you can use stat to get it c. no d_ino available on this filesystem but i'm feeling a bit over-engineering now.

Unlike d_ino (which is in XSI, an optional part of POSIX) st_ino is required by POSIX, so I don't think supporting filesystems with no way to provide st_ino are things we can support.

Other than that, I think the basic design you described here is what I've proposed; d_ino can be present or absent.

It feels like this makes readdir needlessly expensive. For example, in Wazero we use the Go FS for our WASI FS implementation, meaning that it's a flexible FS backend for our users. Most real FS backends provide us with inodes, so we can pass that into WASI, but memory and virtual FS systems do not have inodes.

It would be nice if we could set d_ino to something to say that the FS is never going to return an inode, and that it then not calls the fstatat (maybe -1?)

There's also a bit of a flaw in the reasoning behind adding this, yes, inode is commonly used to check if a file is the same. But this should always happens in combination with st_dev, because st_ino is not unique across multiple devices. Howerver, WASI does not provide for st_dev. Should we somehow combine st_dev and st_ino?

It feels like this makes readdir needlessly expensive. For example, in Wazero we use the Go FS for our WASI FS implementation, meaning that it's a flexible FS backend for our users. Most real FS backends provide us with inodes, so we can pass that into WASI, but memory and virtual FS systems do not have inodes.

The resolution here should make readdir at the WASI level less expensive on these kinds of filesystems, while making readdir at the C/POSIX abstraction level approximately the same cost as it was before.

If you're saying that we shouldn't pay the cost at the C/POSIX abstraction level, then I'd ask the same question I asked above: is efficiency at the C/POSIX level on these specialized filesystems worth teaching all existing application code written at the C/POSIX abstraction level that st_dev+st_ino isn't sufficient to identify a file?

There's also a bit of a flaw in the reasoning behind adding this, yes, inode is commonly used to check if a file is the same. But this should always happens in combination with st_dev, because st_ino is not unique across multiple devices. Howerver, WASI does not provide for st_dev. Should we somehow combine st_dev and st_ino?

WASI does provide for st_dev today; it's the dev field in path_filestat_get and fd_filestat_get at the WASI level, and in stat, fstatat, and fstat in wasi-libc at the POSIX level.

If you're saying that we shouldn't pay the cost at the C/POSIX abstraction level, then I'd ask the same question I asked above: is efficiency at the C/POSIX level on these specialized filesystems worth teaching all existing application code written at the C/POSIX abstraction level that st_dev+st_ino isn't sufficient to identify a file?

Not saying that, just looking for a solution where we do not have to do a fstatat caused by the PR when we already know that that is also not going to provide an inode (ever).

WASI does provide for st_dev today; it's the dev field in path_filestat_get and fd_filestat_get at the WASI level, and in stat, fstatat, and fstat in wasi-libc at the POSIX level.

Huh, indeed. I was rather talking about dirent though. But it looks like POSIX also doesn't provide for that. 🤔
In that case I'm wondering if there are actually people using the readdir result directly to do this comparison, since it doesn't contain the dev, or whether they are then also doing a stat after the readdir.

I'm mostly feeling that wasi-libc has to assume that the runtime will fill in d_ino when it has the inode, putting in the fstatat logic in wasi-libc feels like moving the responsibility. The runtime can also just do that internally if they really want to return the d_ino.

Not saying that, just looking for a solution where we do not have to do a fstatat caused by the PR when we already know that that is also not going to provide an inode (ever).

That raises a different question than this issue started with.

POSIX says "The st_ino and st_dev fields taken together uniquely identify the file within the system." citation. Consequently, it's impossible to implement POSIX on an implementation where statat doesn't provide unique st_ino values, as you describe.

Should we tell all programs that they can't rely on this POSIX guarantee? Or shall we consider those implementations that do that non-conforming?

WASI does provide for st_dev today; it's the dev field in path_filestat_get and fd_filestat_get at the WASI level, and in stat, fstatat, and fstat in wasi-libc at the POSIX level.

Huh, indeed. I was rather talking about dirent though. But it looks like POSIX also doesn't provide for that. thinking In that case I'm wondering if there are actually people using the readdir result directly to do this comparison, since it doesn't contain the dev, or whether they are then also doing a stat after the readdir.

What people do is read the st_dev from the directory they're reading from, since it'll be the same for all the contents of that directory (even if something is mounted on top of something—readdir shows the inode for the thing underneath the mount).

I'm mostly feeling that wasi-libc has to assume that the runtime will fill in d_ino when it has the inode, putting in the fstatat logic in wasi-libc feels like moving the responsibility. The runtime can also just do that internally if they really want to return the d_ino.

The problem is, people writing programs at the POSIX level won't be aware of this "zero means unavailable" behavior. POSIX-abiding code that sees two open files with the same inode number will assume they're the same, and misbehave. Do we just say "sorry applications, we know POSIX said you could do that, but we can't have you doing that; fix your code", or do we say "sorry engine implementors, you can't use filesystems where fstatat doesn't have an inode"?

Should we tell all programs that they can't rely on this POSIX guarantee? Or shall we consider those implementations that do that non-conforming?

I see that as the problem of the implementation indeed, not wasi-libc's. If documentation behavior is that d_ino is optional and that compilers should do fstatat in case it's 0 then it would be a different story.

What people do is read the st_dev from the directory they're reading from, since it'll be the same for all the contents of that directory (even if something is mounted on top of something—readdir shows the inode for the thing underneath the mount).

Make sense, thanks!

The problem is, people writing programs at the POSIX level won't be aware of this "zero means unavailable" behavior. POSIX-abiding code that sees two open files with the same inode number will assume they're the same, and misbehave. Do we just say "sorry applications, we know POSIX said you could do that, but we can't have you doing that; fix your code", or do we say "sorry engine implementors, you can't use filesystems where fstatat doesn't have an inode"?

I think the latter indeed. If WASI says that d_ino/ino/dev is required, then it's completely the problem of the runtime to make sure it's filled, if the FS in the runtime doesn't support it then it should probably not be used for WASI. I don't think wasi-libc should implement some shady workaround to make that work in some cases. In my opinion @yamt should just do an extra stat in their fd_readdir implementation to get the inode.

@jerbob92 I'm not familiar with Go FS, but I talked to someone who knows more about it, and they suggested that for some implementations of io/fs.FS (like the posix-ish os.dirFS, but not embed.FS), you can extract an inode by calling the io/fs.FileInfo.Sys() method, fallibly type-asserting the returned interface to *syscall.Stat_t, and readig its Ino field. Do you think that would work in Wazero?

some files are virtual, ex backed by byte buffers and whatnot. others indeed use embed.FS, so we can only use *syscall.Stat_t on real files. As wasm is supposed to be virtualization with security properties etc, it would seem typical that files are not always going to be real files, and even if they were you wouldn't necessarily want to leak the real inodes.

my 2p is that this is both an expensive requirement, and not a very portable one wrt how files are supplied in wasm. It
assumes direct plumbing to real files, which makes sense (except perf) when that's the case. It doesn't map well to virtualized files and virtualized files are in use today in wasm when someone uses a binary as a library (ex fake a file input from a buffer) or serve content from embedded data, etc. What exactly to do when this is the case seems worthy of a primer, as even though wazero folks noticed this I would doubt we can assume all wasi runtimes always use real files.

fyi in benchmarks, using real files measures an order of magnitude slower for a small directory listing vs embedded FS. We'll implement what's needed, defensively, but generally speaking the more things that bind very strictly, the harder the wasi+wasm promise of portability will be to keep.

One way to at least know what will break without very experienced people guessing is more tripwire tests in the wasi-libc repo or at least the webassembly org. Making a call that helps compiler portability but hurts runtime in the process may be still the path to take, but at least the context will be known more before making a decision and releasing it.

my 2p

The virtualization use case is an interesting point here.

The dev+ino scheme creates two problems here: The device number part assumes that devices exist in a global namespace, which conflicts with the modularity goals of the component model. And the inode part assumes that filesystems have inodes. It's theoretically possible to implement inodes on top of a filesystem that doesn't have them, by maintaining a hash table at runtime of seen files, but that adds a fair amount of complexity and doesn't work if the filesystem supports renaming.

So what if we go the other direction, and remove device and inode numbers?

This would mean the POSIX-required st_dev and st_ino fields would not be provided by wasi-libc, and nor with the d_ino field. To support the common use cases, we could add two new functions, is-same-file and is-same-mountpoint for testing whether two files are the same, or on the same mount point. They wouldn't be drop-in replacements, but many use cases should be able to port to them without great difficulty.

And as an added bonus, this would hide more implementation details that might be leaked in the device and inode numbers.

So this might make sense. But it will mean that codebases that use st_dev/st_ino/d_ino would break when we update to the new APIs which do this, until someone ports them to the new functions.

Thoughts?

I've now submitted #81 to propose removing the device and inode numbers.