typelevel/cats-effect

After IO.blocking(...), execution stays on blocking thread pool

THeinemann opened this issue ยท 3 comments

Hi cats-effect team,

while debugging a program, I found that, when executing some of the IOs in the blocking thread pool, the IOs that follow in the execution (using flatMap) also are executed on the blocking thread pool.

For example:

import cats.effect.IO
import cats.effect.unsafe.implicits.global
import scala.concurrent.duration.DurationInt
{
  for {
    _ <- IO {
      println(0, Thread.currentThread())
    }
    _ <- IO.blocking {
      println(1, Thread.currentThread())
    }
    _ <- IO {
      println(2, Thread.currentThread())
    }
    _ <- IO.sleep(2.millis)
    _ <- IO {
      println(3, Thread.currentThread())
    }
  } yield ()
}.unsafeRunSync()

Prints out this for me:

(0,Thread[io-compute-2,5,main])
(1,Thread[io-compute-blocker-2,5,main])
(2,Thread[io-compute-blocker-2,5,main])
(3,Thread[io-compute-5,5,main])

I would have expected that the third line (with number 2) would also print a thread name from the compute pool (i.e. without blocker in its name).

This seems unexpected to me, especially considering the example in https://typelevel.org/cats-effect/docs/thread-model#blocking (which prints "current pool" after the blocking execution is finished, implying that the execution switched back to the previous pool).

I tested it with cats-effect 3.53 and 3.5.4.

Note that when using interruptible instead of blocking, the example works fine for me.

I think it works as intended. Sometime Fiber doesn't switch thread pool for optimization reason: https://typelevel.org/cats-effect/docs/faq#why-is-my-io-running-on-a-blocking-thread

This is indeed working as intended. :) Basically there's a tradeoff here: if we proactively and immediately shift back from the blocker to the compute worker, we can guarantee you never see compute work on the blocking subpool, but that cost might be paid in vain if you just go right back to blocking again (which commonly happens). So the pool is tuned to optimize this common case of repeated blocking actions (separated by flatMaps and a few maps) by leaving the work on the blocker. This does result in some contention in the kernel-level scheduler, but practical tests suggest the cost of that contention is lower than the savings of avoiding the unnecessary context shift (and subsequent shift back).

In the worst case scenario, where you just have a single blocker and then don't block again for a while, the pool will shift the fiber back at worst when you hit the auto-cede boundary (by default, once every 1024 IO stages), but it's likely you'll be shifted back before then by hitting some sort of asynchronous suspension (like sleep in your example).

Thanks for the explanations!