proposal: io/fs: add writable interfaces
matthewmueller opened this issue · 157 comments
Go 1.16 introduced the embed
and io/fs
packages. The current implementation is a minimal readable filesystem interface.
This is a wonderful first step towards standardizing filesystem operations and providing the community with a lot of flexibility in how we implement filesystems. I would like us to take the next step and define a writable file system interface.
Problem
fs.FS
can't be written to after it's defined.
func Write(fs fs.FS) {
// We can't write to FS.
fs
}
Optional interfaces could be defined in user-land:
func Write(fs fs.FS) {
// We can't rely on vfs.Writable being implemented across community packages.
writable, ok := fs.(vfs.Writable)
}
But it suffers from the same issues that the readable filesystem interface aimed to solve: standardizing the interface across the ecosystem.
Use Cases
I'll list of few use-cases that I've come across since Go 1.16, but I'm sure the community has many more:
- A virtual filesystem that you can write to over time. Useful for file bundlers and databases that work in-memory and flush to the operating system's filesystem at the end.
- Write files to cloud storages like Google Cloud Storage or Amazon S3.
We've already seen started to see this pop up in the community around io/fs
to address the problem in user-land:
- https://github.com/psanford/memfs: defines
fs.MkdirAll
andfs.WriteFile
- https://github.com/vedranvuk/fsex: defines
file.Write
,file.Seek
andfile.Close
.
A quick search on Github will yield more community libraries: https://github.com/search?q=%22io%2Ffs%22+language%3Ago. For many of these implementations, you can imagine a useful writable implementation.
Of course, there are many other file system libraries that came before io/fs
that define writable interfaces like afero and billy.
Proposal
I don't feel qualified to define an interface, I know people have thought about this much harder than I have. What I would love to see from a community member's perspective is the following:
package fs
func WriteFile(fs FS, name string, data []byte, perm FileMode) error
func MkdirAll(fs FS, path string, perm FileMode) error
func Remove(fs FS, name string) error
Nice to Have: Be able to define if a filesystem is readable, writeable or read-writable.
func Open(fs fs.FS) (*DB, error) // Readable
func Open(fs fs.WFS) (*DB, error) // Writable
func Open(fs fs.RWFS) (*DB, error) // Read-Writable
Thanks for your consideration!
I'm currently experimenting with writing a file format encoding/decoding/mutating package that is intended to work with files that aren't guaranteed to easily fit in memory.
I would like to implement it in terms of fs.FS
, which would make it so the library doesn't have to care whether these files actually exist on the local filesystem, in memory, or stored somewhere else. In point of fact, this is intended to be an archival format that distributes the contents of a single archive over a configurable number of actual files, and these files might be distributed geographically across different regions for redundancy.
This codec package doesn't want to care about supporting all the different places that the underlying files could be stored. It just wants to take in an fs.FS
and a list of paths.
Additionally, to make this package more testable, using fs.FS
would make it trivial to write tests without having to actually read and write files on disk.
Unfortunately, since fs.FS
is read-only, I'm sitting here thinking up complicated ways that I could support fs.FS
and somehow still support actually writing and mutating files on disk.
I've got a similar issue with my p9 package. It attempts to implement a 9P client and server in a high-level way, similar to net/http
, and while I'd like to rework the file abstraction to use fs.FS
, it currently would result in odd things due in part to Open()
returning an fs.File
, which is then read-only by design. For now, what I'm leaning towards is just having a function that takes my package's filesystem type and returns an fs.FS
that abstracts it away, but the read-only problem will still be there.
Maybe something like this could work?
type RWFS interface {
FS
WriteFS
}
type WriteFS interface {
// Create creates a new file with the given name.
Create(string) (WFile, error)
// Modify modifies an existing file with the given name.
Modify(string) (WFile, error)
}
type WFile interface {
Write([]byte) (int, error)
Close() error
// Maybe also some kind of WStat() method?
}
Then the returned types would only have to expose either reading or writing methods, and the interface would just handle it transparently.
Persionally, I think that it would be a lot better if there was some way to abstract away the specific requirement of an fs.File
as the returned type so that either Open(string) (*os.File, error)
or Open(string) (SomeCustomFileType, error)
, but that would require language changes and that seems like overkill. It could be partially done with generics, such as with type FS[F File] interface { ... }
, but it has some odd potential complications, and it wouldn't be fully backwards compatible at this point.
To me, this looks like the minimum requirement:
package fs
type WFile interface {
Stat() (FileInfo, error)
Write(p []byte) (n int, err error)
Close() error
}
type WriteFS interface {
OpenFile(name string, flag int, perm FileMode) (WFile, error)
}
And another (ugh) for making dirs:
type MkDirFS interface {
MkDir(name string, perm FileMode) error
}
And some helper functions for convenience:
func Create(fsys WriteFS, name string) (WFile, error) {
// Use fsys.OpenFile ...
}
func WriteFile(fsys WriteFS, name string, data []byte, perm FileMode) error {
// Use fsys.OpenFile, Write, and Close ...
}
func MkDirAll(fsys MkDirFS, path string, perm FileMode) error {
// Use fsys.MkDir to do the work.
// Also requires either Stat or Open to check for parents.
// I'm not sure how to structure that either/or requirement.
}
I think that we should lean more heavily on io and os, rather than making top-level WFile
types. In particular, I think we should basically just define some top-level functions that can fall back all the way to Open
, and not have a WFile
interface at all.
Summary
ErrUnsupported
for when implementations are not available- Optional
FS
methods:WriteFile(name, data, perm) (error)
OpenFile(name, perm) (File, error)
Create(name) (File, error)
- Optional
File
methods:Write(data) (int, error)
Truncate(size) error
for use whenFS.Create
is being emulated- Only used if the file has nonzero size in
Create
- Only used if the file has nonzero size in
Chmod(FileMode) error
for use whenFS.OpenFile
orFS.WriteFile
are being emulated- Only used if the mode does not match after
Open
- Only used if the mode does not match after
- Helpers
Create(fs, name)
: tryCreateFS
, thenOpen+Stat+Truncate
OpenFile(fs, name, perm)
: tryOpenFileFS
, thenOpen+Stat+ChmodFile
WriteFile(fs, name, data, perm)
: tryWriteFileFS
, thenOpenFile()+writeContents
Write(File, data) (int, error)
: callsWrite
or returnsErrUnsupported
Detail
See sketch of an implementation here:
https://gist.github.com/kylelemons/21539a152e9af1dd79c3775ca94efb60#file-io_fs_write_sketch-go
This style of implementation appeals to me because:
- You can check for mutability of a file with a familiar type:
io.WriteCloser
(or justio.Writer
, butFile
requiresClose
) File
mutability, metadata mutability, andFS
mutability are all orthogonal- Mutable file trees can store immutable files
- Immutable file trees can store mutable files
- Not all files in a FS need to have the same mutability constraints
Sub
s could be mutable even though theFS
is not
- This keeps the "primary" top-level interfaces the same:
FS
andFile
Truncate
is not required ifCreate
is never used on nonzero-length filesChmod
is not required if the permissions are correct by default (e.g. by exposing aFixedMode
from your fs package)
I think the same patterns can be used to implement Mkdir
and MkdirAll
on a filesystem as well.
I think that also having a Remove
method that can work with a fs.FS
interface would be a great addition (similar to the current os.Remove
function).
Just a small contribution as it feels to me in the scope of this proposal, and since no one mentionned file removal 🙂
@kylelemons just to be consistent with fs.ReadDirFile, the only File extension at the moment, and iofs draft/The File interface:
type ReadDirFile interface {
File
ReadDir(n int) ([]DirEntry, error)
}
type WriteFile interface {
File
Write(p []byte) (n int, err error)
}
There is a comment from Russ Cox about this exact change as well.
There's an interesting package https://pkg.go.dev/github.com/hack-pad/hackpadfs that's got some interfaces defined for a writable version of io/fs
... The interfaces "feel" pretty much like they should, and are effectively the same as much as what's been discussed here already. I'm in no way associated with that project, just thought it was some interesting work that might inform how writable file interfaces fit into a future release.
Looks like this proposal has label Proposal but isn't a part Review Meeting #33502
@rsc can you move it into https://golang.org/s/proposal-status#active column please?
This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group
I haven't thought as in depth as some of the other folks here, but I came up with a similar set of interfaces and shortcuts for an FS backed by both the local filesystem and backed by the dropbox API. Here's the API (obviously not all of this is so general purpose:)
type CreateFS interface {
fs.FS
Create(string) (FileWriter, error)
}
type FileWriter interface {
fs.File
Write([]byte) (int, error)
}
type RemoveFS interface {
fs.FS
Remove(string) error
}
type WatchFS interface {
fs.FS
Watch(context.Context, string) (chan []fsnotify.Event, error)
}
type WriteFileFS interface {
fs.FS
WriteFile(string, []byte, fs.FileMode) error
}
func OpenDirFS(d string) fs.FS
func Remove(fss fs.FS, path string) error
func Watch(ctx context.Context, fss fs.FS, dir string) (chan []fsnotify.Event, error)
func WriteFile(fss fs.FS, name string, contents []byte, mode fs.FileMode) (err error)
If we add a WriteFile
, I suggest that it should deviate from io.WriteFile
to fix what can otherwise be a subtle concurrency bug.
os.WriteFile
truncates the existing file before writing, which causes otherwise-idempotent writes to race. I suggest that fs.WriteFile
should instead truncate after writing, so that an idempotent file can be rewritten arbitrarily many times without corrupting its contents.
Hello everyone!
Recently I encountered a similar use case where I wanted to dependency inject a filesystem,
and my consumer is required to create, modify or remove files on the fsys.
I came up with the following header interface that closely follows the os
stdlib:
type FileSystem interface {
Stat(name string) (fs.FileInfo, error)
OpenFile(name string, flag int, perm fs.FileMode) (File, error)
Mkdir(name string, perm fs.FileMode) error
Remove(name string) error
}
type File interface {
fs.File
fs.ReadDirFile
io.Writer
io.Seeker
}
I made an interface testing suite to cover roughly the ~127 most common filesystem interactions from a behavioural point of view.
Then I made two suppliers for this interface:
- filesystems.Local
- local file system with optional jail root path.
- it uses the
os
package functions
- filesystems.Memory
- in-memory file system variant that could be used for testing purposes.
The same interface testing suite tests both.
You can use them as a drop-in replacement for places where you had to use os
the package.
The filesystems
package also supplies similar functions as the os
such as Open
, Create
, ReadDir
, and WalkDir
.
I plan to use and maintain this until an easy to use replacement comes out in the std library.
I just wanted to share this with you all, hoping it helps someone out.
Cheers!
I'd write like this:
type WritableFS interface {
fs.FS
OpenFile(name string, flag int, perm fs.FileMode) (WritableFile, error)
}
type WritableFile interface {
fs.File
io.Writer
}
func Create(fsys WritableFS, name string) (WritableFile, error)
func WriteFile(fsys WritableFS, name string, data []byte, perm fs.FileMode) error
...
On top of a +1 for having the need of a writable fs interface, I'd like to enter into the conversation that files are not inherently readable or seekable. for example, files may be opened with fs.ModeAppend
, or os.Stdin
.
That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.
Any writable file implementation could return an error for read/seek operations...but it'd be nice to express that they're not necessary
That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.
@chrisguiney there are already io.Writer
, io.Seeker
, and io.WriteSeeker
interfaces that should meet this need. IMHO using those interfaces best expresses the ability to write and/or seek.
That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.
@chrisguiney there are already
io.Writer
,io.Seeker
, andio.WriteSeeker
interfaces that should meet this need. IMHO using those interfaces best expresses the ability to write and/or seek.
These interfaces are for writable "things" like files, but the requirement in this issue is for a file-system-level interface, not for a "file"-level interface, so this does not seem to answer the need, does it ?
Personally, I don't think there is a practical way to support a writable fs.FS
. Or it's at least extremely non-trivial.
When it comes to reading files, OS-dependent semantic differences can mostly be papered over by handwavingly assuming the fs.FS
is immutable. Once you actively encourage writing to the underlying filesystem, though, you open an uncontainable can of worms of semantic deviations between different operating systems and even filesystems on the same OS.
Questions which immediately arise are
- What happens if a file is opened for reading and for writing simultaneously? Windows (AFAIK) refuses this, returning an error, while most unix-like OSes will allow it. If we say it is forbidden, how do we forbid it under Linux? If we say it is allowed, how do we allow it under Windows? If we shrug and pass the buck, how are programmers supposed to deal with it?
- What kind of atomicity guarantees are given? When a power-loss (or the process is killed by a signal) happens during a
Write
, what are the allowed states for the resulting file to be in? This is entirely unspecified, even within a single OS. It heavily depends on the FS and the settings its mounted with. - Similarly, what about
Flush
? Again, in practice it is mostly unspecified what this actually does (in the presence of a crash) but it's most likely going to need to be supported. - How do renames behave if the target file exists, or source and target are the same file, or one is a symlink to/below the other?
- Speaking of symlinks, what about them? What about unix permission bits and what on systems lacking those? What about extended attributes? What about SELinux etc.?
- What about
flock
? - How do you handle differences in allowed file names on different OSes?
- An interface like
WriteFile
might be reasonable to support, but even that is difficult to do cross-platform and even then, requires different APIs than just a simple "dump this please" to be reliable.
As far as I can tell, there just is no sound way to build an abstraction over filesystems that allows writing. The best advice I've heard so far on this topic is "if you care about data being written, use a database".
Of course, it is possible to just provide some API and tell the programmers that they can't actually rely on the data actually being written, so it shouldn't be used in any production use case. But that just feels irresponsible.
@Merovius isn’t the os package such an abstraction? Would you consider that irresponsible/not safe for prod?
Yeah, I’m not convinced by any of that @Merovius.
The write interface doesn’t need to support every imaginable operation to be useful, and of the operations it would support, Go is well known for taking a pragmatic approach, even when it leads to correctness issues in some circumstances, and this pattern is all over the file interfaces Go already provides. Windows doesn’t support Unix permission bits, so Go just… does whatever it feels like on Windows.
Maybe we shouldn’t use Go in production? I disagree.
A lot of your questions would be up to the implementation to decide. If the implementation is transparently delegating to a real OS filesystem, the answers would mirror what we currently see in the os
package… it would be nothing shocking at all. If the implementation is more abstract, it can and will do whatever seems reasonable, and people will report bugs against that package’s repo if they disagree. That’s how all interfaces work.
@Merovius Perhaps the way to think about this is to concentrate on the cases that do not involve the operating system. After all, we already have a way to write operating system files. What may possibly be helpful is a way to view something like a zip file as a file system that we can write to. That would let code write out a tree of files as it sees fit, with the zip file system arranging for everything to flow out correctly at the end. And then the same code would work for tar.
If we take this approach, then I think the next step to ask is: do we need to support a read/write file system? Or should the file system be write only? Because I think a lot of the concerns go away with a write only file system.
So is there a use case for a read/write file system interface?
I think it's generally fine to gloss over the semantics of how a write will behave on any given os/fs/hardware. Programs already vary in behavior by using *os.File
and related os
functions.
The real power having a standardized interfaces provides is being able to easily swap out the usages with implementations that do make semantics well defined. The biggest use case I have: testing. I need to know how my program behaves if opening a file fails, or hangs indefinitely. Perhaps testing makes for a simplistic example, but those are more well defined semantics than what you'll get from the os
package. If, I can test how my program behaves to any arbitrary condition or state the fs at runtime may possibly present, it's actually easier to make a system that behaves consistently across different platforms.
As far as I can tell, there just is no sound way to build an abstraction over filesystems that allows writing. The best advice I've heard so far on this topic is "if you care about data being written, use a database".
While I've also shared that advice, I'd like to point out two things:
- it's not such a helpful thing to tell someone writing the database itself.
- even databases get it wrong (because fsync is that undefined)
@hherman1 Yes, I believe the os
package is such an abstraction and yes, it has many of the same problems, stemming from "writing files is subtle". See for example @bcmills comment here. Its saving grace, if any, is that it's a relatively thin abstraction. So it's possible to use it and stay relatively close to the OS (e.g. by using OpenFile
in conjunction with x/sys/unix
for flags). If we're trying to be as broad in applications as io/fs
, the abstraction needs to be thicker, hide more of the differences and restrict itself further to the least common denominator.
So maybe that's surprising to people (and your question was rhetorical), but I do more and more think that using os
to write files is questionable in production use. I think it's okay for some use cases, e.g. if you don't do it a lot or if it happens in an interactive program (so it's at least obvious and can be verified if something goes wrong). Otherwise, yes, I think people should not use os
directly, but should use wrappers like renameio or an embedded DB.
@ianlancetaylor I believe people will invariably expect os.DirFS
to support a read/write interface.
- it's not such a helpful thing to tell someone writing the database itself.
- even databases get it wrong (because fsync is that undefined)
For 1: Those people shouldn't use io/fs
or even os
, but likely use x/sys/unix
especially because they have to be careful. And 2: Yes. I was alluding to that above. I think that makes it a worse idea to try and do it yourself, not a better.
In any case. I think it's probably possible to create a good abstraction. Just that it's subtle and it didn't seem that anyone has considered the subtlety involved.
@Merovius no my question wasn’t rhetorical, I meant it. Wasn’t sure how to make that clear here.
AFAICS point 1 is not correct: Windows supports opening a file for read and write simultaneously by passing GENERIC_READ | GENERIC_WRITE
to CreateFileA
.
@fgm The way I read those docs, that depends what is used for dwShareMode
, also based on what other processes have opened those files. But in any case, that list was not meant as a specific list of questions to answer, but more as a list of examples of the kinds of questions that arise when you try to create a cross-platform API for writable files. I'm sure there are answers to more than one of them. I'm less optimistic that more won't crop up over time or that we won't discover new problems over time by using whatever API we box ourselves into by adding it to the stdlib.
But maybe I'm wrong. I just wanted to express my concerns. And make clear that the problem isn't as easy as adding a Write
method to fs.File
.
As mentioned by @hairyhenderson, this issue is less about how to have a writable fs.File
and more on how to have an fs.FS
equivalent that allows creating such files.
I think an approach similar to gocloud.dev/blob would be a good match here:
fs.FS
is read-only and returnsfs.File
which is also read-only. File systems that allow reading should implement this.fs.WriteFS
would be write-only and return probably anio.WriteCloser
or somefs.WritableFile
that aggregates multiple interfaces
If necessary, there could also be an fs.ReadWriteFS
for file systems that do allow reading and writing simultaneously, returning an interface that aggregates both the return types above.
Example (I didn't give much though about the names):
type WritableFile interface {
io.WriteCloser
// Maybe other methods and interfaces
}
type WriteFS interface {
OpenWritable(name string) (WritableFile, error)
}
type ReadWriteFile interface {
WritableFile
File
}
type ReadWriteFS interface {
OpenReadWrite(name string) (ReadWriteFile, error)
}
Note that in this case a file system can implement fs.FS
and fs.WriteFS
while not implementing fs.ReadWriteFS
if a file can only be opened in one of the modes at a time.
This also fits with the existing io interfaces (Reader
, Writer
, ReadWriter
, etc.)
OK, I'll probably get flamed on this, but what's so wrong about something like the PHP stream wrapper feature, adapted to Go syntax, obviously ? https://www.php.net/manual/en/class.streamwrapper.php
@fgm I think that is basically what we are talking about here.
I believe people will invariably expect
os.DirFS
to support a read/write interface.
Fair enough, but that is on the server end. I still wonder what clients would want a read/write interface. I think that thinking of some might help us better understand where the problems might be.
Fair enough, but that is on the server end. I still wonder what clients would want a read/write interface. I think that thinking of some might help us better understand where the problems might be.
I don't understand. By server/client are you referring to the implementor/consumer of the interface?
I don't understand. By server/client are you referring to the implementor/consumer of the interface?
Yes.
The proposal in this issue basically overlays an optional write interface over the existing read interface. I'm questioning that. Are there examples of programs that want to use io/fs and want to use both the read interface and the write interface? Or does would programs be satisfied with a io.WriteFS
that supports Create
to get a WritableFile
that implements io.Writer
?
One example: implementing a mutable archive format. With the current read only abstraction, it is possible to let users transparently swap between accessing an actual directory on disk and accessing a directory that is compressed into an archive. But what if they want to edit a file, and my library wants to support this? I have to make them write library-specific code to accomplish this, and then they have to switch between implementations on their own. They can’t just use a standard abstraction that works for directories, archives, S3 buckets, etc.
Just creating new files and flushing content to them isn’t really enough for the use cases I envision.
It seems like most use cases would be satisfied with a relatively simple interface that allows opening existing files, creating new files, deleting files, as well as reading, writing, and seeking on any file. More advanced functions / optimizations could be layered on with optional interfaces in the future, but simple seems like the right place to start.
os.DirFS
would have no trouble supporting this additional interface that offers mutability. It would then be up to libraries to decide how to implement this new interface, if at all.
Right now, fs.File
already provides optional interfaces, so making Writer
an optional interface too doesn’t seem like too much:
A File provides access to a single file. The File interface is the minimum implementation required of the file. Directory files should also implement ReadDirFile. A file may implement io.ReaderAt or io.Seeker as optimizations.
Then it would be a matter of defining a new fs.WriteFS
that primarily extends fs.FS
with create and delete functions, but possibly another Open
function for handling the distinction between a read-only Open and a mutable Open, kind of similar to what OpenFile in the os
package provides.
I’m sure nothing I’m saying is very original, but that’s my summary of this issue.
Maybe we can restrict the interface to support only writing new files (io.WriteCloser) and atomic appearance into the directory namespace on close. That would align well with object storage scenarios as well as be supportable by os layers.
To cooperate with readers, I would suggest to handle it with a pipe like API:
ReadFS, WriteFS, err := os.OpenFS(dirname string).
That allows to keep the reading part compatible and have a more complex write layering and allows for separate open method for read and write scenarios.
As someone who primarily needs this for making writing testable code less work, my view is that the interface does not need strict definitions whatsoever, and can openly admit that the details are going to be implementation-specific. I don't want to test against the intersection of peculiarities of all OS/FS permutations. I just want a glorified map[string][]byte
.
What really makes this valuable (or would make valuable) is the presence of both os.DirFS
and an existing "make-believe filesystem" implementation for tests (which is missing for unknown reasons) in the standard library. What @adamluzsi suggested in May would be completely sufficient. And even if it wasn't - let's say for example I needed file locks (i.e.advisory file locks on Linux and exclusive file locks on Windows) - I could always define my own interface which extends the present one, then write an implementation for runtime which redirects to os.DirFS
and an implementation for tests which embeds whatever the according type that implements the "glorified map[string][]byte
" would be called.
Is my perspective on this whole problem missing anything big?
my view is that the interface does not need strict definitions whatsoever, and can openly admit that the details are going to be implementation-specific.
Then why have an interface in the first place? You can just use os
directly.
an existing "make-believe filesystem" implementation for tests (which is missing for unknown reasons) in the standard library.
One more use case is for unit tests for functionality that writes to disk.
@noel-yap Shouldn't the test use the same facility the functionality uses? i.e. without a writable io/fs
interface, the implementation uses os
, so the test should use os
as well. That is, I don't see how this is an additional use case.
I think it's reasonable not to, @Merovius. The tests aren't testing whether or not the os
package works. They're testing that whatever they're implementing does what it's supposed to with regards to a filesystem. It's no different from using an io.Writer
with a *bytes.Buffer
in a test while the actual implementation uses, say, a *os.File
or net.Conn
.
@DeedleFake This is starting to get off-topic, but: In that example, is the value in writing tests which don't use *os.File
, or is the value in having an abstraction for the relevant behavior that you can pass to the function? I'm arguing the latter. If you don't have that abstraction - for example, say you use ioctl
s - then you absolutely should use an *os.File
in your test. And enabling to write the test is not a good reason to add interfaces for ioctl
s, if we don't need that interface for its own sake. And the usefulness of io.Writer
has little, if any, to do with writing tests - it can be used to write tests using a *bytes.Buffer
, but that's not why you'd use it ([edit]See also[/edit]).
If you need to test behavior of functions writing files to disk, (*testing.T).TempDir()
exists and it's not particularly less convenient than a hypothetical writable os.DirFS
would be, in my opinion.
@DeedleFake, if you need a FileSystem interaction with writing support, but you don't need to scale out via the process model this FS interaction-based solution, then coupling with the os
package might be enough for you as @Merovius suggest.
My use case requires an FS component because it would feel unnatural to represent it using an entity repository interface.
But the local FS is a bottleneck in scaling with a process model, so I use an FS interface to make it possible to supply a scalable virtual FS solution to my application. This also makes testing easy because the writing support enables me to have testing arrangements in my testing scenarios using business logic that has the FS as a dependency.
I took a look, but it's too simple. For example, it has no equivalent to filepath.Walk()
. One popular library which can do that is afero
, but it seems to be on life support for most of it.
In that example, is the value in writing tests which don't use
*os.File
, or is the value in having an abstraction for the relevant behavior that you can pass to the function?
The value is in reducing dependence on and interaction with the environment. While interactions cannot be fully eliminated (if a test fails because the machine running the test spontaneously explodes, that's usually something out of the developer's control) any step towards it helps in making sure that tests test code instead of things which are not code. It's an idealistic argument, but it stands on valid ground. Writing an abstraction over an environmental dependence (e.g. file system) which mimics it under idealistic conditions which remove any removable environmental effects which could otherwise cause a test testing correct code to fail (e.g. another program running at the same time writing the files we are attempting to read) brings us one step closer to the ideal. It makes a great combination with the concepts of mocks, spies and stubs, but there is no good way to have those in Go, which only serves to make our need for proper abstraction even greater.
For example, it has no equivalent to
filepath.Walk()
.
It is exactly what you asked for: An in-memory, writable implementation of fs.FS
based on a map
, for use in testing. You can use fs.WalkDir
to walk it.
@Merovius We've come full circle then: fstest.MapFS
could work if fs.FS
(which it implements) defined the methods needed for opening writable files. As it stands, the type may be mutable, but it is not a writable file system. But then fs.FS
aspires to be general purpose, and I don't see how the idea of a one-for-all file system interface can be made a reality without imposing restrictions and feature-querying to the point of complete unfeasibility. Even if we get it, I am doubtful we will live to see its implementations, of which there would need to be many, even without counting any of the use-cases OP is mentioning.
The tests aren't testing whether or not the
os
package works. They're testing that whatever they're implementing does what it's supposed to with regards to a filesystem
I'd like to offer a different perspective on this; perhaps this is what you meant and I'm just taking this too literally. Testing whether package os
works on top of your own code is not an error or otherwise a bad move. The opposite is the case: if there are behaviors in os
which cause a test to fail which then leads you to finding a bug in your code, and this behavior may have been omitted from a potential os
abstraction, then that's a huge win. After all tests are green, your code is "ready for production", where os
will be used directly. The problem with using os
directly is that it exchanges a lot of hot kisses with the environment, such that you can enter situations where a behavior of os
causes a test to fail which however was not triggered by a bug in your code. Thus you enter a bug hunt in a spider's web and wonder why there seems to be nothing in need of being squashed. That's why @Merovius suggests using os
directly. A more contrived situation where using os
directly could cause a correct test testing incorrect code to pass successfully when it shouldn't also is conceivable.
Then why have an interface in the first place? You can just use
os
directly.
Perhaps the biggest issue with doing what you suggest is that you cannot proceed in good conscience without taking care of isolation. If the code you are testing involves causing data loss (intentionally or unintentionally) then you don't want it running directly on your machine or anywhere else which cannot be easily replicated by clicking a button in your CI/CD suite of choice.
I digress. What I am on about is different from the general purpose file system interface this issue is about. Unless there are other factors which I have missed, I rest my case.
@MMulthaupt We indeed came back full circle, because we are apparently talking past each other.
- You said you wanted a writeable
fs.FS
to enable writing tests. - I said this is not a new use case, as you either a) need to use a writeable interface in the code under test (i.e. an existing use case) or b) only need a way to populate a read-only
fs.FS
to pass to the code under test. - You claimed that the stdlib lacked a way to do the latter, which is wrong, as
fstest.MapFS
is that.
So I don't understand why we're back here.
I said this is not a new use case
I think I've lost the thread here -- can you please clarify: new in relation to what? go 1.20, or this proposal?
a) need to use a writeable interface in the code under test (i.e. an existing use case) or
I'm probably misunderstanding you, but this proposal exists because the existing fs.FS
interface doesn't support a writable file system.
Are you trying to say that existing code being tested would already need to be accepting an io.Writer
or similar interface? I'm having a hard time grokking what you're trying to communicate.
b) only need a way to populate a read-only fs.FS to pass to the code under test.
- You claimed that the stdlib lacked a way to do the latter, which is wrong, as fstest.MapFS is that.
I don't believe this is correct.
fs.FS
has a single methodOpen(name) (File, error)
File`File
only conforms toio.Reader
, notio.Writer
- the
fstest.MapFS
implementation returns afstest.openMapFile
, which does not implementio.Writer
There's also a lot of value in being able to test how code that uses a writable filesystem might behave in the face of different errors. e.g., testing what occurs when creating a new file fails, or if Write([]byte) (int, error)
returns an error
Really, this is kind of escalating from a pretty minor point. It's not even really worth discussing, but people keep asking me to clarify it, or misunderstanding it, so I feel compelled to keep arguing about it.
I think I've lost the thread here -- can you please clarify: new in relation to what? go 1.20, or this proposal?
This discussion. "Being able to test a function" is not a reason to create an interface. The value io.Writer
adds isn't that it allows you to test a function writing to a file - it's that it allows you to write a function that can write to non-files just as well as files.
This seems like a fairly clear statement, even if it is a minor point. And "don't write interfaces just so you can use them in a test" is also pretty well-established advice and policy in the Go ecosystem.
I don't believe this is correct.
I don't know what to say to this. You quoted me saying "only need a way to populate a read-only fs.FS" and then made some statements about how fstest.MapFS
is read-only. I don't understand how I'm supposed to defend the statement that fstest.MapFS
allows you to pass a populated read-only fs.Fs
to a function, because it seems like an extremely self-evidently correct statement.
There's also a lot of value in being able to test how code that uses a writable filesystem might behave in the face of different errors.
I do agree there is some value in being able to inject failures into filesystem interactions. This is just the general case for mocking. Whether or not mocking is a good testing strategy is a recurring, contentious debate and I don't think we will successfully litigate it here.
However, we don't need to. You can write a mock version of os.Open
or os.Create
just as well as you can write a mock fs.FS
.
So I don't understand why we're back here.
We're back here because of three reasons:
- We had different understandings of what was meant by "glorified" when I said I wanted a "glorified
map[string][]byte
".
In my use case, during tests, both the code being tested as well as the code of the test need to perform reading and writing operations to the same file system. That means I need an Open()
method which can return writable files, as @chrisguiney points out, which neither fstest.MapFS
nor os.dirFS
have. You are correct in saying I could build that; however I had hoped to be able to use existing standard library code.
-
I prioritize isolating tests from the environment.
-
You prioritize idiomatic code.
Again, digression. This issue is about a generic file system, which may have new concepts such as a mechanism to query the available permission model, the guarantees provided and not provided by file locks, even whether the Kernel code will silently drop IO errors, etc.
I don't currently need that. I am working with os
today. So the abstraction I need is an abstraction oriented around os
.
While it does seem there is a lot of discussion around testing, I have thought about a different use-case; a minimal writable file interface for simple applications;
- Right now, we have "emulate all of POSIX/NTFS/APFS or nothing", which is a lot to deal with. If someone does any work by emulating file-stuff in
os
, Hyrum's Law will strike and someone, somewhere will have expectations about concurrent writes, file-locking, seeking, etc. - I'd argue many applications don't need much beyond what
fs.FS
+ some kind un-concurrent version a laCreate(fs, filename)
/Write(fs, filename, io.Reader)
offers. - Would allow many interesting (and novel?) file-systems to be implemented in pure Go
- Simplify testing of business-logic working against such a writable FS.
Inspirations could be FUSE or Gnome's GVfs that supports quite a few schemes.
My particular use-case is messing around with quite large Git repositories, where I need to do fairly small automated edits; I'd love to have something with filesystem-like interface for abstracting away "sparse" git-checkout business:
// Open Git repositoritory
git, _ := gitfs.Open("git@github.com/msiebuhr/foo.git")
defer git.Close()
// Write a file - the exact interface isn't important to me.
f, _ := fs.Create(git, "some/filename.json")
f.Write(`{"some": "json"}`)
f.Close()
// Commit and push!
git.Commit("Update some json")
git.Push("some-branch-name")
And by extending a writable fstest.MapFS
, I'd be able to test the business logic without having to stub out a "real" git repository (which is also becoming a pain-point of it's own).
Hyrum's Law:
With a sufficient number of users of an API, it does not matter what you promise in the contract:
all observable behaviors of your system will be depended on by somebody.
My particular use-case is messing around with quite large Git repositories
IMO this is where writable io/fs
interfaces come in handy.
To add some personal flavour, while developing go-fsimpl
I toyed a few times with adding write functionality to gitfs
and other filesystems, but haven't (so far) due to the lack of appropriate interfaces in io/fs
. It seems awkward to provide compatible read-only filesystems, but to bolt on write support in a way that I'd likely have to change later.
That being said, it doesn't feel that awkward, and I may yet just experiment with the interfaces defined by hackpadfs
, which I believe are effectively what we need.
As to the writable fstest.MapFS
discussion, maybe try out hackpadfs/mem
for a taste of how it could work?
As a suggestion: Instead of having a writable fs.FS
, the corresponding "sinks" could also read from an fs.FS
. That's a bit similar to implementing io.ReaderFrom
instead of io.Writer
.
So e.g. zip.Writer
could get a method func (w *Writer) ReadFrom(fsys fs.FS) error
. The advantage is, that such an fs.ReaderFrom
could easily decide itself which attributes and file types it supports under which semantics and it wouldn't have to worry about nailing down what happens with concurrent or interrupted writes.
I don't think that solves the same problem. The point of a writable interface is so that you could, for example, hand the return from os.DirFS()
, a *zip.Writer
, or some implementation of an fs.FS
that handles some custom HTTP-based API to something and it could just write to it in the same way. Reversing that now requires that the end that is doing the writing act like a readable filesystem. It's possible, but it would be really awkward to work with.
The point of a writable interface is so that you could, for example, hand the return from
os.DirFS()
, a*zip.Writer
, or some implementation of anfs.FS
that handles some custom HTTP-based API to something and it could just write to it in the same way.
That is entirely possible with what I'm proposing. Just like anything that accepts an io.Writer
could also accept an io.ReaderFrom
. An easy way to see that is that you can implement io.Writer
in terms of io.ReaderFrom
and vice-versa.
So I don't really get what you are saying.
It's possible, but it would be really awkward to work with.
Can you give an example? It doesn't seem particularly awkward. And any awkwardness it does have seems, to me, be systematic. i.e. it's expressed in exactly the kinds of questions we would have to answer in general to define a writeable interface.
FWIW I get that it might not be what you (or other people on this issue) have in mind. But personally, I think it's a pretty good solution to at least the use cases I recently had for writeable filesystems. I think it's worth trying it out.
I can imagine two categories of algorithms, where the inversion of the writer responsiblity is not feasible:
- If the existence or the content of a file depends on prior processed files (or data)
- The existence or the content of a file depends on un-owned data
Examples:
- An algorithm which packs files together and chunks those pack files to not exceed a max size. Due to different effects on compression such an algorithm must preprocess the entire data set to create a readable FS structure. Because the order of access to the files cannot be determined, a wrong access pattern lead to an entire reprocess
- An algorithm which transforms (remote, perhaps even read-once) data which it cannot control. To provide a readable view on that, the data stream must be captured and temporarily written somewhere else. Because we don't have an abstraction the only solution would be to write that into the local filesystem. This won't work on cloud nodes with no or limited storage.
@torbenschinke You can very easily invert that argument. For example, a git tree object (mentioned as an example above) requires the written files to be in depth-first, lexicographic order. So to implement a writable FS interface, a git repository would have to persist every directory to temporary storage, in case files are written out of order. On the other hand, this is straight forward to implement using an fs.FS
- you just call ReadDir
recursively in post-order.
@Merovius Don't get me wrong, I find your idea of applying the ReaderFrom pattern to the topic a fresh view and also a good approach. I can think of a lot of scenarios, where this would be a very elegant solution indeed.
But the concern is, that a one-size-fits-all API is not enough to support efficient or natural implementations in practice in all cases.
But the concern is, that a one-size-fits-all API is not enough to support efficient or natural implementations in practice in all cases.
That's exactly why I'm against this proposal. It presumes you can define a one-size-fits-all writeable FS API. That still seems dubious to me, given just how much different "file systems" vary in terms of what they require/guarantee when writing.
As it stands, the readable fs.FS currently represents the only common one-size-fits-all solution. I believe that we can offer a complement with this proposal and that's why it makes sense on its own. I agree with with you that most APIs can never be considered a 100% solution, just as the POSIX file system spec doesn't cover everyone’s requirement.
As another data point: It has been indicated above, that adding a WriteFile like method to the interface would be enough to satisfy the requirement of "a simple writable FS interface". In that case, I'd argue that interface { ReadFrom(fs.FS) }
is also enough, as you can implement a simple WriteFile
helper working with that.
That's exactly why I'm against this proposal. It presumes you can define a one-size-fits-all writeable FS API.
Of course you absolutely can. That API already exists in the form of the OS's filesystem-related syscalls. The point of fs.FS
as it currently stands is that it provides a standardized abstraction that looks like the read parts of the OS API but could have a different underlying implementation. All this proposal is asking for is the exact same thing for the write parts of the API. That's not such a strange thing at all. And yes, there are things which won't make sense with that API, in which case they shouldn't try to fit such an abstraction, exactly the same as the existing fs.FS
when something makes no sense acting as a readable filesystem.
@DeedleFake I do not understand how what you are saying relates to the context of the statement you are quoting (which is the difference between a Writer
-semantic API vs. a ReaderFrom
-semantic API).
This issue has been lingering in part because we don't know how to move forward with it.
Above there is a link to my comment #41190 (comment) which frames the fundamental problem: writing is a concrete operation full of detail that must be included, while reading is an abstract operation that can elide detail without affecting users of a more detailed interface.
The reason we added io/fs was so that existing code that reads from a tree of OS files can be written to read from an abstract tree of files instead. Examples include template.ParseFS and http.FileServer+http.FS.
We would want to add a writable form of io/fs to allow existing code that writes to a tree of OS files to instead write to a tree of abstract files stored elsewhere. But what examples are there of code that writes to a tree of OS files? I can't think of any in the standard library.
We would want to have those examples to guide any discussion of a writable io/fs. If there are no such examples, then that answers the question.
There was talk in #54898 about adding a reverse operation that would extract a *zip.Reader
into a writable fs.FS
equivalent. I think some abstractions about copying both individual files and entire trees from an fs.FS
to a writable one would be quite useful. In fact, with a writable FS, a *zip.Writer
could be made to implement that new interface and a copy operation could handle the case that #54898 is intended for instead.
For myself, I have a 9P implementation that I made a while back and while I haven't done anything with it in a while, my intention has been to at some point make it implement fs.FS
. I've had some hesitation in doing so, however, because of the lack of a writable interface. If one was available, the whole abstraction in general would map to something like a 9P implementation much better than a purely read-only one.
https://github.com/go-git/go-billy (and users of it, such as go-git) might be instructive for some (non-stdlib) example use cases.
The existing functions in os
are already an abstraction layer; there aren't that many of them, so I struggle with the arguments that they can't be added to an interface.
As a developer, I love io/fs
, but hate that Go now has different APIs for read-only vs. writable directory trees (io/fs
for read-only if you want to support e.g. embedding and in-memory trees for tests, and os
for read-write). I'd like to see the two APIs merged, i.e. just be able to pass a fs.FS
to functions doing file operations, without having to worry about the distinction.
For a concrete example of where having two APIs causes pain: os.DirFS
doesn't like leading slashes in pathnames. I've gotten bitten by this when converting code between io/fs
and os
—it introduced subtle bugs outside of code that had to be rewritten to use one or the other.
The existing functions in
os
are already an abstraction layer; there aren't that many of them, so I struggle with the arguments that they can't be added to an interface.
Note that the fact that os
is already an abstraction layer does lead to real problems - before you add things like "git repositories" or "zip files" into the set of things to abstract over. For example, it is generally impossible to handle an error returned by WriteFile, which is probably the simplest "writable interface" you can imagine. If an error is returned, that might mean the file does not exist, it exists in a partially written state, it exists but contains entirely different content, it already existed beforehand, it existed beforehand but no longer exists…
There exist abstractions which are WriteFile
like but provide some more guarantees (e.g. by creating a temporary file and renaming it over the target) but they tend to pretty heavily depend not just on the OS, but also on the FS. e.g. under some Linux filesystems, you can use O_TMPFILE
to get relatively clean semantics, but outside of Linux…
Personally, I'm having trouble seeing the value in an abstraction that says "if no error is returned, you're probably good - if there is, all bets are off".
But what examples are there of code that writes to a tree of OS files?
@rsc shouldn't the existence of multiple alternative non-stdlib writable filesystem interface modules be enough in terms of examples? Simply looking at which modules import those should provide plenty of data... (I'm thinking of billy, afero, hackpadfs, etc)
@Merovius I would love to see a proposal for better error return values (e.g. making it possible to differentiate between the error cases you mentioned using errors.Is
), but I'm not sure that proposal needs to hold up this one.
I don't know much about this stuff, so apologies if I get this wrong. But it sounds like the use cases for writable io/fs are for abstract file systems. When they are used to write to a real file system, there is a layer in between that is specifying aspects that are not part of the abstract file system, such as file ownership and permissions.
If that sounds right, and it may not, then perhaps we should be thinking in terms of a writable io/fs layer that only exposes the simplest possible operations, which I think is Create
to get a WritableFile
that implements Write
. And, possibly, Mkdir
, but only with a name, not a permission.
We should provide an implementation of that interface that lives entirely in memory. And we should make it easy for users to write an implementation that translates the calls into operations on some other file system, adding in the details that that file system requires. The os package, for example, could provide an implementation that uses user-defined function hooks to tweak the actual calls.
what examples are there of code that writes to a tree of OS files?
The cmd/go
script engine comes to mind:
https://cs.opensource.google/go/go/+/refs/heads/master:src/cmd/go/internal/script/state.go;l=143-177;drc=48a1dcb92778a349e13bcb8be10a40047f0cf7d1
Another example that is a bit less compelling: testing. I occasionally write code which migrates different file trees of config. It creates/deletes/modifies/reads files. It would be nice to be able to use a simple map[string]string as the file system in tests, as it would be easier to inspect and construct than a TempDir file tree.
Speaking of, aren’t deletes an important operation too @ianlancetaylor? Or are the semantics weird for them in some way?
@ianlancetaylor that's a start, but incomplete. Ownership and permissions aren't exclusively a local filesystem concept - some object stores also have these concepts...
In my case I'm working on a composable virtual file system which has Unix semantics (permissions, owner ID, timestamps). At the moment the best abstraction available is billyfs. I do have an io/fs compatibility layer but due to the current writable limitations I can't use it everywhere yet.
If that sounds right, and it may not, then perhaps we should be thinking in terms of a writable
io/fs
layer that only exposes the simplest possible operations, which I think isCreate
to get aWritableFile
that implementsWrite
. And, possibly,Mkdir
, but only with a name, not a permission.
I think the Create
, Mkdir
, and MkdirAll
operations in io/fs
should have the same signatures as the ones in the os
package — primarily to make it easier for existing users to migrate from os
to io/fs
, but also to allow implementations that do support os
-style permissions to implement them.
I think the “writable filesystem” parts of os
would be covered by four FS
extension interfaces and three File
interfaces:
type OpenFileFS interface {
FS
OpenFile(name string, flag int, perm FileMode) (File, error)
}
type ChtimesFS interface {
FS
Chtimes(name string, atime, mtime time.Time) error
}
type RemoveFS interface {
FS
Remove(name string) error
}
type MkdirFS interface {
FS
Mkdir(name string, perm FileMode) error
}
type ChmodFile interface {
File
Chmod(mode FileMode) error
}
type ChownFile interface {
File
Chown(uid, gid int) error
}
type TruncateFile interface {
File
Truncate(size int64) error
}
Each of those extensions could be implemented independently, so filesystems could (for example) support Truncate
but not Chown
or Chmod
, could support creating files but not directories, creating files and directories but not removing them (an “append-only” filesystem), and so on.
@bcmills We could go that route, but what should be the approach for a writable file system that doesn't support permissions? Should a call to OpenFile
fail if the permissions are not 0o777
?
What does Go currently do if you try to open a file on a FAT filesystem, which does not support permissions? I think a lot of things are going to be implementation specific, since they already are.
For filesystems without fine-grained permissions, I think we should follow the Windows Chmod
behavior of the os
package:
On Windows, only the 0200 bit (owner writable) of mode is used; it controls whether the file's read-only attribute is set or cleared. The other bits are currently unused. For compatibility with Go 1.12 and earlier, use a non-zero mode. Use mode 0400 for a read-only file and 0600 for a readable+writable file.
It would be up to the implementation to decide whether to ignore unsupported bits or explicitly reject them.
- For example, an in-process filesystem necessarily only has a single user, so it could safely ignore the “group” and “other” permissions without introducing any security issues.
- OTOH, if a filesystem does not support read-only files at all, it could return an explicit error if the “user write” permission bit is not set when the file is created.
@ianlancetaylor it would be up to the fs implementation to decide what to do with permissions just as it's currently up to the OS to decide what to do with permissions when calling os.OpenFile.
All that's needed is a common set of interfaces
At the risk of sounding like a broken record: IMO stuff like this (what to do about unsupported features or what happens in case of errors) is absolutely part of the "interface", even if it is not part of the type. For example, io.Writer
specifies that the implementation must not modify its argument and do the entire write in one go and io.Reader
specifies when and how to return io.EOF
and that data should be processed before checking the error.
Explicitly deciding to have it implementation defined is one thing - though I'd find that rather useless, IMO it invalidates the entire point of having say a Copy(to, from fs.FS) error
function, if it just may just croak for a subtle reason like that. But not saying anything at all about it seems definitely ill-advised.
@paralin I'm not yet convinced that entirely leaving it up to the implementation is good enough. The point of this set of interfaces is to decouple the code using the filesystem from the code implementing the filesystem. If the filesystem can do as it pleases, then reasonable code will not behave as expected, and we won't be able to decide where the bug is.
@Merovius but this is the reality we already deal with when it comes to filesystems. The job of this interface would just be to make it possible for applications to work seamlessly with filesystems of more varieties.
Being able to substitute in various virtual filesystems without having to go through the overhead of a FUSE driver is an extremely useful thing.
Filesystems are a messy reality, but that does not make an interface like this less useful. If a given filesystem can’t meet your application’s needs, that’s fine. As a developer, you’re still going to be responsible for testing these things work together.
The Go standard library already has a read-only filesystem abstraction, and third party implementations can still do whatever they want. This is nothing new.
@Merovius, I think it's worth expressing these parts of the interface in terms of invariants that must hold, although those invariants may be conditional. Unfortunately, most of the useful-seeming invariants already don't hold for real files, and I don't think we can reasonably require semantics that not even os
provides.
For example: one might want the property that creating a file without the “user read” permission prevents the same program from subsequently opening the same file for read. But for the os
package, that invariant today does not hold on Unix platforms when the user is running as root.
So I'm not sure that there is really anything useful we can say about permissions, except perhaps that the fs.FS
implementation should pass the permission bits down to the underlying implementation. 🤷♂️
If the filesystem can do as it pleases, then reasonable code will not behave as expected, and we won't be able to decide where the bug is.
Again, this is already the case. Filesystems can do whatever they want, whenever they want. Unless we remove filesystem support from Go entirely, this will continue to be the case.
I do not understand why things are any different here.
The standard library would likely provide two implementations: one that can open a directory and provides no real logic other than just passing these calls directly to the os
package, and one for an in memory filesystem. Beyond that, it is entirely up to the implementations to offer something that developers want to use. No one is ever forced to use any specific implementation, just like you’re not currently forced to use Go only on ext4
filesystems. You can run Go programs on FAT
filesystems that support (virtually) nothing that is expected to be supported.
These kinds of complex expanding discussions are exactly why I suggested that if we do anything at all we should do the simplest possible thing (#45757 (comment)).
People already can and do create writable filesystem interfaces that live outside of the standard library. If we're going to add something to the standard library it needs to be simple and clear. If we have trouble with that, then we shouldn't add it to the standard library. Which is OK.
Saying it behaves the same as the os
package is a simple thing.
As far as I can tell, that is the simplest thing that matches with reality.
Trying to build a perfect abstraction is the complicated thing. That's the part of this discussion that is "expanding", and I do not think anyone can come to a consensus on what a perfect abstraction would look like for filesystems. This is why we should stick with the interface that we already have, just made to be compatible with in-process filesystems.
For filesystems without fine-grained permissions, I think we should follow the Windows
Chmod
behavior of theos
package:
Alternatively, writeable interfaces could be paramerized:
type OpenFileFS[Perm any] interface {
FS
OpenFile(name string, flag int, perm Perm) (File, error)
}
type MkdirFS[Perm any] interface {
FS
Mkdir(name string, perm Perm) error
}
type ChmodFile[Perm any] interface {
File
Chmod(mode Perm) error
}
Most implementations may stick to FileMode
as P
, but some may define their own types.
I agree with everyone else, all that's needed is to make some interfaces that mirror what the os package does. Allow the underlying fs to implement the interfaces. It doesn't need to be any more complicated than that.
@Merovius I personally don't understand what part of it isn't convincing. Are you holding out for an infallible abstraction? No one here (AFAICT) is asking for Go to solve the decades-long problem that filesystems naturally offer a wide array of guarantees and features. As best as I can tell, that problem is completely intractable. Why is it okay for the Go standard library to offer the os
filesystem interfaces at all, if it isn't okay for things to be implementation-specific?
People are asking for a standardized way to build and share code that isn't locked to a particular concrete filesystem implementation. This is what the os
package already offers, it's just a step short of allowing Go developers to define their own filesystems and use the same interface to access those. If it weren't such a pain to offer novel filesystems at the operating system level, I think people would be happy with that, but in the reality we live in... it would be extremely beneficial to be able to offer filesystems to other code that lives within our own process.
I also don't understand why you perceive read-only fs.FS
as something different. Implementations can still do whatever they want there too, and that has been fine.
Are you holding out for an infallible abstraction?
No, I'm saying there can't be one, so we shouldn't have one.
Why is it okay for the Go standard library to offer the os filesystem interfaces at all, if it isn't okay for things to be implementation-specific?
The os
package at least signals that this problem is operating system specific. But, yes, TBQH I find myself doubting the os
package for that exact reason. I almost never use it to write files, instead relying on things like renameio. And even that makes me feel queasy.
I also don't understand why you perceive read-only
fs.FS
as something different.
Because reading operations are far easier to define uniformly and their failure modes are far easier to understand. If a read fails, you get an error and you know you can't trust the data you read. If a write fails, you know essentially nothing. There is no way to recover from that. And a write can even fail without returning an error. There is pretty much no way to write safe code that writes files. Unless you're willing to spend a lot of time and effort to introspect the file system you are running on and write very careful code specifically for them.
But as I said, I already accepted that I'll die lonely on this hill. I think convincing me should be a non-goal. I probably shouldn't even have repeated myself.
The standard library would likely provide two implementations: one that can open a directory and provides no real logic other than just passing these calls directly to the
os
package, and one for an in memory filesystem.
I think at the very least, there should also be one for archive/zip
. It's been a major motivation for fs.FS
from the start and it's popped up a couple of times in this discussion. Arguably, there should also be one for archive/tar
, if only because that will surface more interesting issues with trying to define and implement a generic interface.
In particular, both of these ask the question of whether it should be safe to call WriteFile
(or whatever) concurrently. I think to be useful, it generally should be (Copy(dst, src fs.FS) error
might have pretty bad performance if it couldn't write concurrently), but I'm not sure how that could work with archive/*
and similar use cases, unless you block with writing the next file until the previous one has been Close()
ed.
I agree it would be nice to see where else it could be used in the standard library, and those would be great use cases too.
As far as your earlier point about the failure modes of a read-only filesystem, their failure modes can be just as catastrophic as you see with writable filesystems. If a read-only filesystem silently returns partial or corrupted data, that is every bit as catastrophic as silently failing to write data, since that bad data can influence all sorts of real-world events or eventually flow into a writable filesystem further down the line. It is important for developers to select implementations that do what the developers need them to do, and do so with a level of reliability that the developer is happy with. Read-only vs writable is only tangentially important in my view, when talking about the risks of filesystems.
In particular, both of these ask the question of whether it should be safe to call WriteFile (or whatever) concurrently
The unsatisfying answer provided throughout this thread applies here: it depends on the implementation. It's entirely possible for someone to have a use-case for a filesystem that isn't thread-safe. For something in the standard library, it has such a wide audience that it is probably worth a small performance hit to ensure that it will handle concurrent access safely, but being safe doesn't mean that the implementation has to refrain from returning errors for concurrent access.
If we just walk through this hypothetical, archive files are particularly implementation-sensitive. For archive/tar
, it probably wouldn't be a good idea to allow mutating a file at all once it is closed, since the general solution for mutating an existing file within an archive is to rewrite the entire archive file. Only one file should be open for writing at a time, enforced by returning a "device or resource busy" error to anyone else that tries to open a file for writing while that one is open. This way the new file can be efficiently appended to the end of the archive. The number of concurrent readers of existing files would not be limited, and the file being written would likely not be available to readers until it is closed, just for simplicity of implementation. Obviously someone could implement an archive format that doesn't suffer from any of these restrictions, and their implementation of the writable filesystem interface would reflect that.
For your example of Copy
, this works flawlessly. There is an existing file in the archive named source
. The consumer of this interface wishes to copy it to dest
. The Copy
function would open the source file (source
), open a new file (dest
) for writing, and start copying the data into dest
. Only one file is open for writing, so this causes no problems.
If we come up with a convoluted example where we need to simultaneously open more than one file for writing, then archive/tar
probably isn't the right choice. That's just a limitation of the tar
format. A writable filesystem that exposes an interface for AWS S3 would not suffer from this limitation at all: you could open lots of files for simultaneous writing and not run into any "device busy" errors. The right filesystem for a particular job is up to the application developer to research and test, which is better than the current situation for the os
package, where you just get whatever filesystem happens to be running on the target computer, and the developer often has no control over that at all.
The unsatisfying answer provided throughout this thread applies here: it depends on the implementation.
I think that means the answer is "no". Again, the entire point here is to be able to write abstract code that can work with arbitrary implementations and if code can't safely assume that it can concurrently call WriteFile
, then we should say so.
For your example of
Copy
, this works flawlessly. There is an existing file in the archive named source. The consumer of this interface wishes to copy it to dest. The Copy function would open the source file (source), open a new file (dest) for writing, and start copying the data into dest. Only one file is open for writing, so this causes no problems.
My understanding is that Copy
benefits significantly, performance-wise, if it would spawn a separate goroutine per directory. At least I remember reading an article to that effect recently.
If we come up with a convoluted example where we need to simultaneously open more than one file for writing, then
archive/tar
probably isn't the right choice.
That's not generally how abstractions work. When writing Copy
, you have to decide whether you do a concurrent copy or not. And you don't know what implementation you are called on. Again, that's why the abstraction exists - to be able to write code that works without having to know the details of the implementation.
I think it's fine not to specify that concurrent writes must be safe - after all, concurrent writes to an io.Writer
are also not safe, in general. But we should be aware that that inherently limits the usefulness of generic code like Copy
.
Again, that's why the abstraction exists - to be able to write code that works without having to know the details of the implementation.
Yes and no... the point of the abstraction is to make the implementation swappable. It doesn't mean that every implementation is guaranteed to work for every scenario. It's the same with io.Writer
. io.Writer
makes no guarantee that you can write an unlimited amount of data, so particular implementations of io.Writer
may not work for every use case. There is still utility in being able to swap out the concrete implementations without having to rewrite the calling code or (worse) have multiple completely separate implementations of the calling code that are just duplicates.
My understanding is that Copy benefits significantly, performance-wise, if it would spawn a separate goroutine per directory.
I believe that is highly dependent on the underlying physical media. If the underlying media is a spinning hard drive, this would tank the performance. If the underlying media is an Optane SSD, it would probably improve performance quite a bit. For things in between, the results would likely fall in between those extremes. I don't think the results generalize well enough for a standard library implementation to default to parallel copying, but it would be interesting to see what the research says.
Especially considering the context, parallel copying into an archive/tar
would be terrible for performance no matter what the implementation looks like, just given the nature of the format. A developer having the option to use a CopyParallel
would make sense. Having it be the default would not make sense to me. If someone would benefit from parallelism with the filesystems they're dealing with, they should make that choice consciously. Someone choosing to support filesystems that both benefit and don't benefit from parallelism could wrap the latter into an abstraction that meets their needs, allowing only a single code path to cover all use cases. Perhaps that would be a MutexFS
abstraction that prevents multiple concurrent accesses wrapping those, relying on the developer to have confirmed that this won't cause any deadlocks, just a change in control flow.
I agree that the interface definitions for a writable fs.FS
must describe its concurrency properties, such as any limits on the number of files that may be opened or written concurrently, and that the simplest definition of the interface would forbid concurrent calls to any method that may modify the filesystem's contents.
(Perhaps it would be possible to also define an extension method that reports whether concurrency is allowed, but I think that would be introducing more complexity than is appropriate for this proposal.)
Filesystems are messy. The goal here is not to magically make them simple; it's to make them swappable. The os
package is already an abstraction over filesystems. I just want that abstraction to extend to developer-defined filesystems. This would make many things easier: testing could use in-memory filesystems; a single library could seamlessly support any filesystem that meets its requirements, which could be the regular os
filesystem abstraction, it could be S3, it could be GCP Cloud Storage, it could even be a tar
file.
I have been writing my own archive format. Code that expects a File handle should be able to read and write my files just as if they were "real" operating system files. That code shouldn't have to care. Code that expects to open a directory and look around should be able to do that just the same as if it were a real directory. I should be able to pass an fs.FS
throughout this application code, and it shouldn't matter whether it is a real directory or my archive format, as long as my code is meeting the needs of the application.
Right now, there is no good way to handle this as a library. Importing a third party abstraction is unsatisfying, since other libraries won't be built around those. Either they expose the concrete os
types, or expose some other incompatible abstraction. It's just an untenable situation for anyone who cares about this stuff. The os
interface is supposed to be filesystem agnostic.
@bcmills I agree concurrency is something that should be defined.
I think either definition is potentially fine, but it is probably best to define the interface's expectations as close to the OS interface as possible, which assumes that concurrent access to a filesystem is allowed, and the filesystem can return errors if there are cases where it isn't, but the filesystem object itself should probably be thread-safe by default. If a library chooses to offer something that isn't thread safe, it can make a note of how it is violating the expected interface. The language cannot enforce this invariant, for better or worse.
the filesystem can return errors if there are cases where it isn't
I strongly oppose this. If the interface says that concurrent access is allowed, then a file system that can't support it needs to implement locking. Or we say it isn't allowed. But Copy
(or other fs.FS
consuming code) should not just fail to work.
(side-note: that's the article I was talking about before. Just for the curious)