rust-lang/rust

Tracking issue for fs::walk_dir

alexcrichton opened this issue ยท 15 comments

This is a tracking issue for the fs_walk unstable feature in the standard library. Some of the open questions currently are:

  • There are many many ways to walk a directory, and there probably wants to be something like a WalkBuilder structure.
  • The current walking strategy holds open a lot of file descriptors - #23715
  • The default walking strategy has not been well audited for all manner of fs corner cases

The C++ <filesystem> implementation of walking directories is quite a useful reference in terms of what options we may want as well as what we haven't implemented. Note that this is a large enough feature that this may want to be prototyped outside the standard library first and then perhaps consider moving in at a later date.

I've found Java's Files::walk_file_tree really useful since it supports things like stopping iteration early and running code at different points in the iteration.

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#walkFileTree-java.nio.file.Path-java.nio.file.FileVisitor-

I'd like to take the conn on this, and echo this thought:

Note that this is a large enough feature that this may want to be prototyped outside the standard library first and then perhaps consider moving in at a later date.

I've taken a brief survey other other implementations which range from every-feature-in-the-book (ftw in glibc) to very minimal (Go's filepath.Walk), so there's a lot of variation on what we can do. I'd like to start with the following goals:

  • Fix the "too many open fds" issue. I am particularly fond of making the common case fast and falling back to some kind of memory allocation if we exceed some preset (but configurable) limit on the number of open file descriptors. (I think this probably means switching to depth first search, since it "feels" like shallow but bigger directories are more common than very deeply nested directories. In particular, with depth first search, we can guarantee that at most one file descriptor is open for each depth.)
  • Provide some kind of facility for traversing symlinks (but this should be configurable and will require extra resources if we want to preserve the invariant that every item is yielded exactly once).
  • Provide control over iteration (e.g., "don't descend into this directory" or "skip the rest of this directory.")

Most of the interfaces I've seen use a user-provided function to control iteration or even response to errors. I feel like we should probably pursue the same functionality but with iterators if possible, although I'm not sure what it will look like yet. Possibly methods on the iterator itself? (Although this is a design that is not too common in the Rust world.)

How does this sound for a reasonable starting point?

This may also be worth looking at:

Whatโ€™s New In Python 3.5: PEP 471 - os.scandir() function โ€“ a better and faster directory iterator

huonw commented

@SimonSapin fwiw, that PEP has apparently already been taken into account for the current design: #16236.

Never mind, then :)

@BurntSushi that all sounds great to me! I know that whenever I've wanted to use a fs walk-like function I've always wanted the option to say "don't descend into this directory" so it's certainly something I'd like to see!

This definitely seems like a good candidate for development outside the standard library as it can be a relatively meaty piece of functionality and in theory our fs bindings should give you all the support you need to build this.

I've finished my initial implementation: http://burntsushi.net/rustdoc/walkdir/

Overall, I'm pretty happy with it. It isn't clear to me whether it's destined for std. In particular, the crate's custom DirEntry type has grown a fair bit of complexity, mostly due to the handling of following symbolic links. Should I just let it bake on crates.io for now? Write an RFC? (I have acquired a fair amount of institutional knowledge that might be good to encode into an RFC, even if the intention is to postpone.) Do we deprecate std::fs::walk_dir?

I think the best path forward would be to maybe let it bake for a bit, move it to the nursery, see how it turns out, and then eventually write an RFC to become an official rust-lang crate, and perhaps finally move it into the standard library. I don't necessarily feel an urgency to move it into the standard library right now, and the fast iteration outside seems quite useful for now!

Sounds good to me!

๐Ÿ‘

On Mon, Sep 28, 2015 at 6:56 AM, Andrew Gallant notifications@github.com
wrote:

Sounds good to me!

โ€”
Reply to this email directly or view it on GitHub
#27707 (comment).

I wonder if we should go ahead and deprecate the in-tree version? Is there a path toward stabilization as-is?

Nominating for 1.6 discussion.

I'd vote to deprecate as well.

Deprecation sounds good. Stabilizing as-is definitely seems bad.

๐Ÿ”” This issue is now entering its cycle-long final comment period for deprecation ๐Ÿ””

The libs team discussed this in triage today and the decision was to deprecate