rust-lang/rust

walk_dir uses too many file descriptors

hugoduncan opened this issue · 6 comments

The current implementation of fs::walk_dir is breadth first, with a file descriptor being held for each directory that has been found but not yet traversed. This can lead to "Too many open files" errors.

On my mac, the default limit for file descriptors is 256. A .git/objects directory can contain 257 directories, so with a breadth first search with a queue of ReadDir objects (each of which holds a DIR), this limit can easily be hit.

I can see two possible solutions: either changing the queue to hold paths rather than ReadDir objects, or switching to depth first traversal.

I hit this error running the following on a directory tree containing a .git directory, and with a max directory depth of 9.

#![feature(fs_walk)]

use std::fs;
use std::io;
use std::path::Path;

fn main() {
    match walk() {
        Ok(_) => (),
        Err(e) => println!("ERROR {}", e)
    }
}

fn walk() -> Result<(), io::Error> {
    for f in try!(fs::walk_dir(&Path::new("."))) {
        let f = try!(f);
        println!("copy_tree {:?}", f.path());
    }
    Ok(())
}

Exactly the same issue exists with depth-first traversals as well, when directory structure could be deeper than RLIMIT_NOFILE. I do not think I’d want to see paths being stored in memory either.

A hybrid approach similar to one taken by ftw(3) could be implemented. It would be a breaking change to add a parameter to specify limit of descriptors.

Oh dear I definitely didn't intend for this to happen! I intended for it to only have an active list of directories proportional to the current depth. I do agree with @nagisa that a DFS wouldn't solve the problem, but I suspect directories are more often wide than deep.

I also would love to add tons of configuration to Walk. I sketched out an idea or two in the RFC issue with a WalkOptions structure which may affect this as well.

clee commented

Yeah, this sounds like the problem I'm seeing for sure. 👍

This is fixed in an external crate: http://burntsushi.net/rustdoc/walkdir/

The in-tree walk_dir is now deprecated, in favor of @BurntSushi 's crate.