typelevel/fs2

Bad performance of fs2.io.file.Files[IO].walk()

sven42 opened this issue · 4 comments

sven42 commented

I am migrating a tool to use fs2 (v3.9.2).
The first step is just to collect about 26000 files in a directory tree for further processing.

Here are the timings of this first step for several solutions:

  • java.nio.file.Files#walkFileTree: 0.5s
  • DeprecatedFilesApi.walk: 0.7s
  • fs2.io.file.Files#walk: 4s

Thanks for the report. Are you using Files[IO].walk(path) or one of the overrides? Could you also describe the directory tree a bit - branching factor, depth, etc.

sven42 commented

Thanks for your reply.

I am using Files[IO].walk(path)
The directory contains 26.000 files in 3.300 directories.
The branching factor is 5 and max depth is 8.

I hope this helps.

This doesn't entirely surprise me. walk has a lot of Stream.eval and essentially zero batching. A much faster approach would be to, at each locus, gather all children and partition them, batch-emitting anything which can be emitted and individually recursing anything which can't. The number of suspensions then should be O(n) in the number of directories within the hierarchy, whereas right now it is proportional to files + directories.

Minor update: this is so slow that I had to trim down the depth used in benchmarking in order to make it tractable on my laptop. Here's what I'm using to test:

@State(Scope.Thread)
class FilesBenchmark {

  private[this] var target: Path = _

  @Setup
  def setup() = {
    val file = File.createTempFile("fs2-benchmarks-", "-walk")
    file.delete()
    file.mkdir()
    target = Path(file.toString)    // ewwwww

    val MaxDepth = 4
    val Names = 'A'.to('E').toList.map(_.toString)

    def loop(cwd: File, depth: Int): Unit = {
      if (depth < MaxDepth) {
        Names foreach { name =>
          val sub = new File(cwd, name)
          sub.mkdir()
          loop(sub, depth + 1)
        }
      } else if (depth == MaxDepth) {
        Names foreach { name =>
          val sub = new File(cwd, name)
          sub.createNewFile()
          loop(sub, depth + 1)
        }
      }
    }

    loop(file, 0)
  }

  @Benchmark
  def countFiles(): Int = {
    Files[IO]
      .walk(target)
      .chunks
      .map(_.size)
      .fold(0)(_ + _)
      .compile
      .lastOrError
      .unsafeRunSync()
  }
}