Kotlin/kotlinx.coroutines

Potential bug involving Undispatched and withContext

ListenableFuture opened this issue · 5 comments

I ran into this issue where I have a method like the following:

suspend fun doSomeIo() = withContext(Dispatchers.IO) {
  // Code that should run on IO
}

The code should always run on the IO dispatcher and I think it should be guaranteed from the way I wrote it.

However, it turns out that if someone calls my function like so, it can actually run my code on a different thread (like the Android Main thread):

val ioScope = CoroutineScope(Dispatchers.IO + SupervisorJob())

// Starts on Main thread
ioScope.launch(start = CoroutineStart.UNDISPATCHED) {
  doSomeIo()
}

This seems to happen because they start it UNDISPATCHED, so it continues on the Main thread, which makes sense. However, when the withContext in doSomeIo is checked it sees it is the same dispatcher as their ioScope and so doesn’t dispatch again, even though it currently isn’t actually on the IO dispatcher.

This seems like a bug because as a code author of doSomeIo I can’t guarantee what thread my code is run on, and also the effect can actually go through multiple functions, classes, and files making it difficult to debug.

I was hoping to ask your team for their perspective on this. Thanks!

Yes, I agree that the caller's ability to force the code to run undispatched until the first suspension point breaks encapsulation. Indeed, to force the code to run in the correct context all the time, as a workaround, an initial dispatch has to be performed: withContext(dispatcher) { yield(); actualCode() }.

This problem can be fixed in kotlinx.coroutines by extending the Job state machine, introducing a new state: "Was created, and maybe even ran some code, but hasn't gone through a proper initial dispatch yet" (this would also help with #4382 (comment)). Then, withContext could detect that no proper dispatch happened yet, in which case, in would ignore the fast path of avoiding a dispatch when the dispatcher stays the same.

I'm not sure about the tradeoffs of fixing the issue. I don't believe I've seen any errors being caused by this. CoroutineStart.UNDISPATCHED is indeed an instrument for breaking the encapsulation, and being a solution for a niche problem, it's used sparingly in practice: https://grep.app/search?q=CoroutineStart.UNDISPATCHED On the other hand, if we change withContext to be protected against this surgical tool, valid (albeit clearly non-idiomatic) use cases can be broken. If UNDISPATCHED is involved, it means the code relies on a specific execution order and is brittle as is.

If someone actually encounters a bug in their code because of this withContext behavior, then clearly, we should do the right thing and patch withContext, but if not, I believe we should leave this part untouched. Have you encountered or heard of someone encountering an issue caused by this in practice?

Thanks for the details, I added some responses below.

Have you encountered or heard of someone encountering an issue caused by this in practice?

In our case, this caused an actual bug in one of our APIs which resulted in it running disk reads on the main thread.

Indeed, to force the code to run in the correct context all the time, as a workaround, an initial dispatch has to be performed: withContext(dispatcher) { yield(); actualCode() }

Adding yield() would ensure the correct context, however it would unfortunately introduce an extra dispatch in other cases, and would also need to be added to all withContext blocks to prevent this issue, so I don’t think there’s any practical workaround we can use without a fix.

In our case, this caused an actual bug in one of our APIs which resulted in it running disk reads on the main thread.

This is an important point, thanks! Do you happen to remember the details of how this bug was introduced? That is, why did someone use UNDISPATCHED for launching a coroutine that calls withContext(Dispatchers.IO) before the first dispatch?

The bug was caused due to this launchFuture utility, which defaults to UNDISPATCHED, and is called from getDataAsync().