UndispatchedCoroutine leaks CoroutineContext via thread local when using custom ContinuationInterceptor
pyricau opened this issue · 4 comments
Bug description
To fix #2930 (thread locals using ThreadContextElement
not cleaned up), #3252 introduce a little hack (source) in UndispatchedCoroutine
which immediately saves the previous thread context to UndispatchedCoroutine.threadStateToRecover
within the UndispatchedCoroutine
constructor, only when the ContinuationInterceptor is not a CoroutineDispatcher:
internal actual class UndispatchedCoroutine<in T>actual constructor () : ScopeCoroutine<T>() {
init {
if (uCont.context[ContinuationInterceptor] !is CoroutineDispatcher) {
val values = updateThreadContext(context, null)
restoreThreadContext(context, values)
saveThreadContext(context, values)
}
}
If the continuation doesn't suspend, then UndispatchedCoroutine.afterResume()
isn't invoked and UndispatchedCoroutine.threadStateToRecover.remove()
is not called.
One thing of note, the javadoc for UndispatchedCoroutine.threadStateToRecover
claims this:
* Each access to Java's [ThreadLocal] leaves a footprint in the corresponding Thread's `ThreadLocalMap`
* that is cleared automatically as soon as the associated thread-local (-> UndispatchedCoroutine) is garbage collected.
Unfortunately, that's not exactly correct. When the ThreadLocal is garbage collected, its value actually stays in the ThreadLocalMap
, until either the thread itself gets garbage collected, or the ThreadLocal map cleans up stale entries, which only happens when using thread locals some more.
So when the right conditions apply, the CoroutineContext instance ends up being held in memory long after it's no longer needed.
To reproduce this, you need:
- A CoroutineContext that contains a ContinuationInterceptor that is not a CoroutineDispatcher (to trigger the UndispatchedCoroutine constructor hack)
- Call
withContext()
, changing the CoroutineContext but without changing the ContinuationInterceptor (to trigger thewithContext()
fast path 2 that starts the coroutine on the same ContinuationInterceptor, wrapping it with UndispatchedCoroutine.
Here's a repro: #4295
package kotlinx.coroutines
import kotlinx.coroutines.testing.*
import org.junit.Test
import java.lang.ref.*
import kotlin.coroutines.*
import kotlin.test.*
class CustomContinuationInterceptorTest : TestBase() {
fun `CoroutineDispatcher not suspending does not leak CoroutineContext`() =
ensureCoroutineContextGCed(Dispatchers.Default, suspend = false)
@Test
fun `CoroutineDispatcher suspending does not leak CoroutineContext`() =
ensureCoroutineContextGCed(Dispatchers.Default, suspend = true)
@Test
fun `CustomContinuationInterceptor suspending does not leak CoroutineContext`() =
ensureCoroutineContextGCed(
CustomContinuationInterceptor(Dispatchers.Default),
suspend = true
)
// This is the one test that fails
@Test
fun `CustomContinuationInterceptor not suspending leaks CoroutineContext`() =
ensureCoroutineContextGCed(
CustomContinuationInterceptor(Dispatchers.Default),
suspend = false
)
@Test
fun `CustomNeverEqualContinuationInterceptor suspending does not leak CoroutineContext`() =
ensureCoroutineContextGCed(
CustomNeverEqualContinuationInterceptor(Dispatchers.Default),
suspend = true
)
@Test
fun `CustomNeverEqualContinuationInterceptor not suspending does not leak CoroutineContext`() =
ensureCoroutineContextGCed(
CustomNeverEqualContinuationInterceptor(Dispatchers.Default),
suspend = false
)
private fun ensureCoroutineContextGCed(coroutineContext: CoroutineContext, suspend: Boolean) {
runTest {
lateinit var ref: WeakReference<CoroutineName>
val job = GlobalScope.launch(coroutineContext) {
val coroutineName = CoroutineName("Yo")
ref = WeakReference(coroutineName)
withContext(coroutineName) {
if (suspend) {
delay(1)
}
}
}
job.join()
System.gc()
assertNull(ref.get())
}
}
}
class CustomContinuationInterceptor(private val delegate: ContinuationInterceptor) :
AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {
override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
return delegate.interceptContinuation(continuation)
}
}
class CustomNeverEqualContinuationInterceptor(private val delegate: ContinuationInterceptor) :
AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {
override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
return delegate.interceptContinuation(continuation)
}
override fun equals(other: Any?) = false
}
Fix
I see a few options:
- Implement #3253 which presumably means the hack can be removed.
- Provide users with a way to escape the
if (uCont.context[ContinuationInterceptor] !is CoroutineDispatcher) {
check somehow (e.g. introduce an additional funky interface..)
Workaround
Override equals in any custom ContinuationInterceptor
implementation and have it return always false, to skip the fast way that's creating the UndispatchedCoroutine creation.
How we found this
I ran into some Compose related memory leaks in Square's UI tests back in February, asked the Compose team if they knew anything about that. They didn't. Then yesterday folks from the Compose team reached out because they're enabling leak detection in their test suite and were running into the exact same leak. We worked together on analyzing a heap dump and debugging and eventually root cause it to this bug.
The Compose team is looking into whether they can start using CoroutineDispatcher implementations directly in Compose tests (for other reasons as well, e.g. see this issue)
Here's the leaktrace as reported by leakcanary:
====================================
HEAP ANALYSIS RESULT
====================================
1 APPLICATION LEAKS
References underlined with "~~~" are likely causes.
Learn more at https://squ.re/leaks.
44298 bytes retained by leaking objects
Signature: c4797f2a268bbe8b50072affe05b2f82ce29f5ed
┬───
│ GC Root: Thread object
│
├─ java.lang.Thread instance
│ Leaking: NO (the main thread always runs)
│ Thread name: 'main'
│ ↓ Thread.threadLocals
│ ~~~~~~~~~~~~
├─ java.lang.ThreadLocal$ThreadLocalMap instance
│ Leaking: UNKNOWN
│ Retaining 446.6 kB in 9221 objects
│ ↓ ThreadLocal$ThreadLocalMap.table
│ ~~~~~
├─ java.lang.ThreadLocal$ThreadLocalMap$Entry[] array
│ Leaking: UNKNOWN
│ Retaining 446.6 kB in 9220 objects
│ ↓ ThreadLocal$ThreadLocalMap$Entry[2]
│ ~~~
├─ java.lang.ThreadLocal$ThreadLocalMap$Entry instance
│ Leaking: UNKNOWN
│ Retaining 444.0 kB in 9136 objects
│ ↓ ThreadLocal$ThreadLocalMap$Entry.value
│ ~~~~~
├─ kotlin.Pair instance
│ Leaking: UNKNOWN
│ Retaining 444.0 kB in 9135 objects
│ ↓ Pair.first
│ ~~~~~
├─ kotlin.coroutines.CombinedContext instance
│ Leaking: UNKNOWN
│ Retaining 444.0 kB in 9134 objects
│ ↓ CombinedContext.left
│ ~~~~
├─ kotlin.coroutines.CombinedContext instance
│ Leaking: UNKNOWN
│ Retaining 442.9 kB in 9112 objects
│ ↓ CombinedContext.left
│ ~~~~
├─ kotlin.coroutines.CombinedContext instance
│ Leaking: UNKNOWN
│ Retaining 442.9 kB in 9111 objects
│ ↓ CombinedContext.element
│ ~~~~~~~
├─ kotlinx.coroutines.internal.ScopeCoroutine instance
│ Leaking: UNKNOWN
│ Retaining 441.0 kB in 9040 objects
│ ↓ ScopeCoroutine.uCont
│ ~~~~~
├─ androidx.compose.foundation.gestures.DefaultScrollableState$scroll$2 instance
│ Leaking: UNKNOWN
│ Retaining 441.0 kB in 9038 objects
│ Anonymous subclass of kotlin.coroutines.jvm.internal.SuspendLambda
│ ↓ DefaultScrollableState$scroll$2.$block
│ ~~~~~~
├─ androidx.compose.foundation.gestures.ScrollingLogic$scroll$2 instance
│ Leaking: UNKNOWN
│ Retaining 104 B in 2 objects
│ Anonymous subclass of kotlin.coroutines.jvm.internal.SuspendLambda
│ ↓ ScrollingLogic$scroll$2.this$0
│ ~~~~~~
├─ androidx.compose.foundation.gestures.ScrollingLogic instance
│ Leaking: UNKNOWN
│ Retaining 796 B in 33 objects
│ ↓ ScrollingLogic.overscrollEffect
│ ~~~~~~~~~~~~~~~~
├─ androidx.compose.foundation.AndroidEdgeEffectOverscrollEffect instance
│ Leaking: UNKNOWN
│ Retaining 484 B in 12 objects
│ ↓ AndroidEdgeEffectOverscrollEffect.edgeEffectWrapper
│ ~~~~~~~~~~~~~~~~~
├─ androidx.compose.foundation.EdgeEffectWrapper instance
│ Leaking: UNKNOWN
│ Retaining 56 B in 1 objects
│ context instance of androidx.activity.ComponentActivity with mDestroyed = true
│ ↓ EdgeEffectWrapper.context
│ ~~~~~~~
╰→ androidx.activity.ComponentActivity instance
Leaking: YES (ObjectWatcher was watching this because androidx.activity.ComponentActivity received Activity#onDestroy() callback and Activity#mDestroyed is true)
Retaining 44.3 kB in 797 objects
key = 6bdc725d-e124-437c-895d-437bf03213af
watchDurationMillis = 5182
retainedDurationMillis = 174
mApplication instance of android.app.Application
mBase instance of android.app.ContextImpl
For anyone with access to the ASG slack, here's the investigation thread: https://androidstudygroup.slack.com/archives/CJH03QASH/p1733875296139909?thread_ts=1708204699.328019&cid=CJH03QASH
Thanks for a really comprehensive and easy-to-follow bug report!
The issue is clear, and the potential solution is easy to implement (though hard to accept :)) and is hidden in the problem statement: If the continuation doesn't suspend, then UndispatchedCoroutine.afterResume() isn't invoked
.
The undispatched handling path should just be aware of that -- either by having an explicit cleanup function in
ScopedCoroutine` or by if'ing the actual implementation
I have a prototype that passes your tests (again, thanks!), I'll polish it up tomorrow morning and open a PR. It should make it to 1.10 coroutines with Kotlin 2.1
Thank you for fixing!!
Could I request a backport for 1.8.x, we'd like to pick up this fix in compose without forcing our clients to kotlin 2.0?