whatwg/fs

What is a FileSystemHandle?

a-sully opened this issue · 12 comments

TPAC highlighted a pretty substantial architectural difference between the Chromium and Firefox implementations of this API. Chromium's implementation maps a FileSystemHandle to a file path (which I had attempted to codify a while back), while Firefox's implementation suggests "they are references to objects resolved at creation time."

This has gone unnoticed until this point because no features have exposed this difference to the web.

But there are some significant web-observable implications to this choice, most notably around directory moves. Consider the following code, which moves a directory containing a child we have a handle to:

// Create file at /old/file.txt.
const dir_entry = root.getDirectoryHandle('old', { create: true });
const file_entry = dir_entry.getFileHandle('file.txt', { create: true });

// The file previously pointed to file at /old/file.txt now resides at /new/file.txt.
await dir_entry.move('new');

What happens next?

// PATH-BASED: The promise rejects with NotFoundError.
// REF-BASED:  The call succeeds and returns the File.
await file_entry.getFile();

// PATH-BASED: The promise rejects with NotFoundError.
// REF-BASED:  The promise resolves successfully and creates a new writable stream.
await file_entry.createWritable();

// PATH-BASED: The promise resolves to `null`.
// REF-BASED:  The promise resolves to 'file.txt' (same as before the move).
await dir_entry.resolve(file_entry);

// PATH-BASED: The isSameEntry() promise resolves to `false`.
// REF-BASED:  The isSameEntry() promise resolves to `true`.
const other_file = await dir_entry.getFileHandle('file.txt');
await other_file.isSameEntry(file_entry);

// PATH-BASED: The directory's ID has changed, along with its path.
// REF-BASED:  The directory's ID remains unchanged since before the move.
await dir_entry.getUniqueId();
// PATH-BASED: The file's ID remains unchanged, along with its path.
// REF-BASED:  The file's ID remains unchanged since before the move.
await file_entry.getUniqueId();

// This method is proposed in https://github.com/whatwg/fs/issues/38
// PATH-BASED: The promise presumably will resolve to `null`.
// REF-BASED:  The promise presumably will resolve to a copy of `dir_entry`.
await file_entry.getParent();

// PATH-BASED: getFile() is once again valid and returns the file at /old/file.txt.
//             The isSameEntry() promise resolves to `true`.
// REF-BASED:  getFile() returns the file at /new/file.txt.
//             The isSameEntry() promise resolves to `false`.
const old_dir = await root.getDirectoryHandle('old', { create: true });
const other_file = await old_dir.getFileHandle('file.txt', { create: true });
await file_entry.getFile();
await other_file.isSameEntry(file_entry);

Directory moves very blatantly expose this difference. It will be basically impossible to specify directory moves in a way that's consistent across platforms without being much more specific here. This also has implications for at least the getUniqueId()method and removed handles. Directory moves would also expose a difference in resolve() and a discussed-but-not-yet-formally-proposed getParent() method, among other things.

Looking at the code above, I tend to agree that a ref-based approach leads to outcomes which are more likely to align with user expectations, at least regarding directory moves. It would be nice if moving a directory didn't invalidate all child handles, for example. Meanwhile, there are some instances where a path-based approach arguably makes more sense. What happens to a FileSystemHandle when its underlying file is removed?

​​// Create a file.
const file_entry = root.getFileHandle('file.txt', { create: true });

// Remove the file.
await file_entry.remove();

// PATH-BASED: We can still write to a removed file.
// REF-BASED:  ????
await file_entry.createWritable();

// PATH-BASED: The isSameEntry() promise resolves to `true`.
// REF-BASED:  The isSameEntry() promise resolves to `false`, presumably?
const other_file = root.getFileHandle('file.txt', { create: true });
await other_file.isSameEntry(file_entry);

That being said, it seems reasonable to specify that a handle can still be written to if it's removed via this API, while handles removed by other means (such as by clearing site data or via an OS file manager) could be invalidated.

In summary...

One option is to never specify features that expose this implementation difference, such as directory moves. Unfortunately this is a pretty fundamental difference which I suspect will be hard to paper over as the API continues to evolve. To me, this is a very unsatisfying option. Consider a web IDE which just wants to rename src/foo/ to src/bar/, but is forced to recursively copy all contained files.

Another theoretical option is to accept that there will be cross-browser differences. Sure, moving a directory will invalidate all child handles, but you can re-acquire all the handles within the new directory. However, going down this path is bound to expose many more subtle cross-browser differences. For example, Chromium locks all ancestor directories when a file has an open writable to ensure its path does not change while it is being written to. Having an open writable will block moving a parent directory, which would be a confusing restriction in a reference-based design. This option would be bad for developers and the impacted users, but just listing it here for completeness.

The other option is to specify a requirement that a FileSystemHande is a reference (in the same way I had attempted to specify it’s a path here). That framework could be used to specify new methods such as move(), remove() and getUniqueId() in a way that’s consistent across browsers. This is our preferred option, but supporting this in Chromium would require a pretty substantial re-architecture that I'm hesitant to commit to without clear indication from other browsers and the developer community that handles should be based on references rather than paths...

@szewai does WebKit have an opinion here?

jesup commented

For the case of createWritable() on a FileSystemFileHandle that's been removed, Firefox currently fails the createWritable with NotFoundError, which seems reasonable (we created a FileBlob with a path that doesn't lead to a file -> NotFoundError).

jesup commented

Open writables for us only lock the file, by the way. And with stable identifiers, move() of a file or directory doesn't actually cause a filesystem move, just a DB update.

jesup commented

Note also that resolve() (or whatever we rename it to) is also impacted, of course. With a ref-based approach, resolve() would return the new path; with a pure path-based approach, it would return the original path (which may no longer refer to a file). With an invalidation-based approach, which the currently-written spec seems to take, it should probably throw.

Invalidation would make all file/directoryhandles referring to or under a moved item become invalid, and would throw on most/all attempts to use them (but what about SyncAccessHandles? Thus your lock-on-parent-dirs approach I presume).

Pure-path based would be visible in that given a handles to a, a/b, c and c/b, a->move(d) would make the make a/b invalid, but a subsequent c->move(a) would make the a/b handle valid again (and make c/b invalid). Invalidation would have that as never being valid again. With references, a->move(d) would make a/b into d/b.

jesup commented

I prefer reference-based. It also simplifies thinking about SyncAccessHandles, since in a reference world, it's ok if an ancestor changes name, so there's no need to lock directories above it (and this would be the first idea of a directory lock here, which would be surprising to users).

It seems in normal file system API, there will be two types of objects (classes): path and handle (entry, file descriptor), where path is stateless and handle is stateful. Path can be used for creating/deleting/renaming a file, and handle can be used to modify file content (e.g. c++ FileSystem library).

I think the original design is FileSystemHandle is path and AccessHandle is handle (TPAC 2021 notes). That means FileSystemHandle would not hold a reference to the underlying file but AccessHandle would, i.e. AccessHandle might be usable after file renaming, but FileSystemHandle is not (as its path is unchanged).

However the existing spec seems to indicate FileSystemHandle is handle, e.g. underlying file needs to be existent for creating a FileSystemHandle. I guess that's why we wonder if FileSystemHandle should be reference-based. I think another option to update existing interfaces and make FileSystemHandle actually path (e.g. adding a create function instead of using options in getFileHandle), which is resolved to an object when needed.

jesup commented

If FileSystemHandle is a path, then the option to create the file/dir is odd and doesn't make sense; in that case the option to create should be on createSyncAccessHandle/createWritable.
If it's a reference, then having the create option on it (as we do now) makes sense.

jesup commented

Resolving this issue blocks other work on the spec (and code/wpt's). Is there anyone else that needs to weigh in on this? Any other comments to resolve?

I agree that this is the issue that we need to figure out first. Based on the discussion in #34 and other issues, I think we probably just want to mimic POSIX behavior and API.
That means FileSystemHandle is probably a path (that can be used to create, move, remove files and get file attribute), and FileSystemSyncAccessHandle and FileSystemWritableFileStream is probably fd (reference, that can be used to read/write file content).
We will need some changes on existing interfaces like @jesup mentioned: moving the create option to createSyncAccessHandle/createWritable (these functions might mimic open() in POSIX).

TLDR I'm supportive of the approach @szewai laid out and (barring any major concerns from @jesup) we can file other issues to bikeshed the specifics. Making a decision here would go a long way towards unblocking a number of other issues

I agree that this is the issue that we need to figure out first. Based on the discussion in #34 and other issues, I think we probably just want to mimic POSIX behavior and API. That means FileSystemHandle is probably a path (that can be used to create, move, remove files and get file attribute), and FileSystemSyncAccessHandle and FileSystemWritableFileStream is probably fd (reference, that can be used to read/write file content).

+1

We looked quite hard into re-architecting around references, but outside of the OPFS this raises more questions than it solves and leads to behavioral inconsistencies that, frankly, I think would leave developers frustrated (e.g. say site A and B hold a handle to the same file. What happens to each site’s file handle if the file is moved… by site A? by another application on the machine? while the browser isn’t running? while a copy of the handle is stored in IndexedDB? if the site is registered for file change notifications WICG/file-system-access#72?). I know this spec isn't focused on the non-OPFS portion of the API, but a reference-based approach would certainly increase the effort needed to support even a subset of it (e.g. mozilla/standards-positions#738).

And at the end of the day, it would be quite nice to be able to tell developers "a FileSystemHandle maps to a path” without all the "gotcha"s

We will need some changes on existing interfaces like @jesup mentioned: moving the create option to createSyncAccessHandle/createWritable (these functions might mimic open() in POSIX).

Just to clarify, are you suggesting removing the create option from getFileHandle() and getDirectoryHandle(), or adding it to createSyncAccessHandle() and createWritable()?

Following the POSIX model, these are touch and mkdir, respectively. Removing the option from getDirectoryHandle() would remove the ability to create new directories altogether. Presumably we don't want this... (and deprecating this method would be a significant breaking change)

Adding the create option to createSyncAccessHandle() and createWritable() (in the spirit of matching POSIX open) seems reasonable. A couple follow-up questions that come to mind (which can probably be bikeshedded on another issue):

  • What should be the default behavior if create is not specified?
    • Currently (at least in Chromium), createSyncAccessHandle() will fail if the path doesn't exist, while createWritable() will not. It would be nice if this was consistent - probably by failing in both cases
  • Do we need some sort of "create empty self from handle" for a directory, as well?
    • If so, would it be worthwhile to have a dedicated create() API?
    • For files, this can otherwise be achieved by calling createWritable() and writing an empty file
    • That being said, re-creation of directories isn't as straightforward. What happens if someone tries to re-create an OPFS root #9 (comment)? There's an argument that you should only be able to re-create a directory if you have access to the parent, which makes #38 quite appealing

Just to clarify, are you suggesting removing the create option from getFileHandle() and getDirectoryHandle(), or adding it to createSyncAccessHandle() and createWritable()?

I was thinking about adding it to createSyncAccessHandle() and createWritable(), and removing it from getFileHandle() and getDirectoryHandle().

Following the POSIX model, these are touch and mkdir, respectively. Removing the option from getDirectoryHandle() would remove the ability to create new directories altogether. Presumably we don't want this... (and deprecating this method would be a significant breaking change)

One issue about having create option on getDirectoryHandle() is we need to have parent handle to create this file/directory, while touch and mkdir can create file/directory directly from a path. So it seems we should add touch()/create() function on FileSystemHandle instead. This might be helpful if the handle is restored from IndexedDB.

What should be the default behavior if create is not specified?
Currently (at least in Chromium), createSyncAccessHandle() will fail if the path doesn't exist, while createWritable() will not. It would be nice if this was consistent - probably by failing in both cases

It could be similar to open() without O_CREAT flag.

Do we need some sort of "create empty self from handle" for a directory, as well?

Yes, this is my first thought.

That being said, re-creation of directories isn't as straightforward. What happens if someone tries to re-create an OPFS root #9 (comment)? There's an argument that you should only be able to re-create a directory if you have access to the parent, which makes #38 quite appealing

I am not sure I understand this. If we allow OPFS root directory to be removed, it seems fine to allow it to be recreated?

Following the POSIX model, these are touch and mkdir, respectively. Removing the option from getDirectoryHandle() would remove the ability to create new directories altogether. Presumably we don't want this... (and deprecating this method would be a significant breaking change)

One issue about having create option on getDirectoryHandle() is we need to have parent handle to create this file/directory, while touch and mkdir can create file/directory directly from a path.

I think this is desirable behavior? touch() doesn't allow you to create an arbitrary path - the parent must exist for the operation to succeed. It's also the default behavior for mkdir() (the -p flag will create intermediate directories, if necessary).

I can get behind adding a create() method (perhaps optionally allowing the creation of intermediate directories), but I don't see the point of removing the option from getFileHandle() and getDirectoryHandle(). This would be a significant hit to backwards-compatibility for not much gain, AFAICT... I'd strongly prefer to not cause that kind of breakage if we don't have to.

What should be the default behavior if create is not specified?
Currently (at least in Chromium), createSyncAccessHandle() will fail if the path doesn't exist, while createWritable() will not. It would be nice if this was consistent - probably by failing in both cases

It could be similar to open() without O_CREAT flag.

SGTM 👍

That being said, re-creation of directories isn't as straightforward. What happens if someone tries to re-create an OPFS root #9 (comment)? There's an argument that you should only be able to re-create a directory if you have access to the parent, which makes #38 quite appealing

I am not sure I understand this. If we allow OPFS root directory to be removed, it seems fine to allow it to be recreated?

I agree that it seems reasonable to allow re-creating an OPFS root - I just want to make we behave consistently in this case:

let root = await navigator.storage.getDirectory();
await root.remove();
let newRoot = await navigator.storage.getDirectory();
await root.create();  // This is a no-op
assert(await newRoot.isSameEntry(root));

I just put up #96, which sketches out what specifying a path-based approach could look like. It's still very much a WIP, but gives an example (with FileSystemFileHandle.getFile()) of what the API's methods could look like if a handle is explicitly associated with a path, which is then mapped to an entry.

Please feel free to leave feedback on that PR. Thanks!