Bad performance of fs2.io.file.Files[IO].walk()
sven42 opened this issue · 4 comments
I am migrating a tool to use fs2 (v3.9.2).
The first step is just to collect about 26000 files in a directory tree for further processing.
Here are the timings of this first step for several solutions:
java.nio.file.Files#walkFileTree
: 0.5sDeprecatedFilesApi.walk
: 0.7sfs2.io.file.Files#walk
: 4s
Thanks for the report. Are you using Files[IO].walk(path)
or one of the overrides? Could you also describe the directory tree a bit - branching factor, depth, etc.
Thanks for your reply.
I am using Files[IO].walk(path)
The directory contains 26.000 files in 3.300 directories.
The branching factor is 5 and max depth is 8.
I hope this helps.
This doesn't entirely surprise me. walk
has a lot of Stream.eval
and essentially zero batching. A much faster approach would be to, at each locus, gather all children and partition them, batch-emitting anything which can be emitted and individually recursing anything which can't. The number of suspensions then should be O(n) in the number of directories within the hierarchy, whereas right now it is proportional to files + directories.
Minor update: this is so slow that I had to trim down the depth used in benchmarking in order to make it tractable on my laptop. Here's what I'm using to test:
@State(Scope.Thread)
class FilesBenchmark {
private[this] var target: Path = _
@Setup
def setup() = {
val file = File.createTempFile("fs2-benchmarks-", "-walk")
file.delete()
file.mkdir()
target = Path(file.toString) // ewwwww
val MaxDepth = 4
val Names = 'A'.to('E').toList.map(_.toString)
def loop(cwd: File, depth: Int): Unit = {
if (depth < MaxDepth) {
Names foreach { name =>
val sub = new File(cwd, name)
sub.mkdir()
loop(sub, depth + 1)
}
} else if (depth == MaxDepth) {
Names foreach { name =>
val sub = new File(cwd, name)
sub.createNewFile()
loop(sub, depth + 1)
}
}
}
loop(file, 0)
}
@Benchmark
def countFiles(): Int = {
Files[IO]
.walk(target)
.chunks
.map(_.size)
.fold(0)(_ + _)
.compile
.lastOrError
.unsafeRunSync()
}
}