Potentially harfmul SynchronizedObject usages on iOS platform
qwwdfsad opened this issue · 13 comments
We have a report in public Slack about very slow CMP application: https://kotlinlang.slack.com/archives/C0346LWVBJ4/p1732321550585009
Luckily, the user provided an instrument profile trace, and, it's visible that some huge chunks of work are spent in our synchronization.
Notably, a good chunk (41%) of the time between 02:01 and 02:10 is spent within
kfun:kotlinx.coroutines.internal.LimitedDispatcher.obtainTaskOrDeallocateWorker#internal
-> kfun:kotlinx.atomicfu.locks.SynchronizedObject#lock(){}
. It requires further investigation, but it is worth taking a look
Instruments file for the history: Hang iPhone SE CMP 1.7.0.trace.zip
Waiting for the specific coroutines version, as atomicfu got a lock upgrade recently
Will this issue be considered solved if LimitedDispatcher
is rewritten to be lock-free, or are all usages of locks to be investigated?
Would be nice to understand the root of the issue (it seems like it's LimitedDispatcher
peculiarity) and address it instead. I don't think avoiding locks on K/N is justified
I'll double-check, but after a brief refresher on the source code, I see no fault in LimitedDispatcher
. Maybe that's the iOS priority inversion issue we've been hearing about. The lock in LimitedDispatcher
is only taken for a short time, with no potential for spin-locking on it that I've noticed.
Might be the case with allocation-spamming lock either, I've asked the user about coroutines version
As an update, I have updated my codebase to Kotlin 2.1 and Compose Multiplatform 1.7. Coroutines are still version 1.9.
The original genesis for this issue was my app freezing, but thankfully that has not happened. However, the phone is still getting hot, and profiling with Time Profiler shows large amounts of SynchronizedObject
calls. Attached is an image of the profiling. I will provide the updated trace file in the Kotlin Slack.
Thanks! The culprit seems to be https://github.com/Kotlin/kotlinx-atomicfu/blob/bad63e743ac905a298386446a82a88225c2f71fc/atomicfu/src/nativeMain/kotlin/kotlinx/atomicfu/locks/Synchronized.kt#L13-L66. There is a work in progress to replace them with a more efficient implementation on the atomicfu side, so the problem is expected to be solved in future releases. It's not going to be very soon, though, as writing efficient mutex implementations takes a lot of effort.
It doesn't seem like there's evidence of coroutines using the mutex incorrectly, which was the original topic of this issue. Can this be closed?
I suppose, but a few follow up questions:
- Might I be able to provide my app's codebase to Jetbrains privately so you could use it for testing? I feel like this all especially pushes the limits of atomicfu through extensive use of
SharedFlow
- required for a websocket. It could be useful to benchmark if the updates improve things. - Is there anything in the interim I can do to mitigate this? I can't go live with intermittent app freezes.
- If this issue is closed, is there any place I can still track updates that will improve this functionality?
Thanks!
Is there anything in the interim I can do to mitigate this? I can't go live with intermittent app freezes.
I thought this no longer happened with newer Kotlin and Compose versions? Is there a reason not to upgrade?
Might I be able to provide my app's codebase to Jetbrains privately so you could use it for testing?
Yes, this would be excellent, thanks! You could share a link / an archive / whatever with me in the kotlinlang Slack (Dmitry Khalanskiy [JB]
) or add me to your project on github. I'll share your project with other teams internally if you do.
If this issue is closed, is there any place I can still track updates that will improve this functionality?
Good point, I don't think there's a public Github issue in atomicfu for tracking this. Let's leave this one open, then, and I'll close it if we do open an atomicfu issue.
Thank you for the reply! I've sent you my app's codebase on the Koltinlang Slack.
Unfortunately, it is still happening with Compose 1.7 and Kotlin 2.1. I just profiled my app in release mode and was able to obtain the freeze. It appears most likely to happen when scrolling lists; I'll be scrolling, navigating back and forth, scroll some more, and then freeze.
What's interesting is that the "long" usage of SynchronizedObject.lock()
shows up on more threads now - worker threads, child threads and the Main thread.
The file is too big to attach here, so I will send it to you as well.
As another update, as I was scrolling through the same list that can sometimes cause the freezing, I got this Thread Performance Checker message:
Thread Performance Checker: Thread running at User-interactive quality-of-service class waiting on a lower QoS thread running at Default quality-of-service class. Investigate ways to avoid priority inversions
PID: 57512, TID: 2040891
Backtrace
=================================================================
3 DRiP Haus 0x0000000106148f0c _ZN12SkMessageBusIN5skgpu27UniqueKeyInvalidatedMessageEjLb1EE5Inbox4pollEPN12skia_private6TArrayIS1_Lb0EEE + 148
4 DRiP Haus 0x0000000106147818 _ZN15GrResourceCache13purgeAsNeededEv + 52
5 DRiP Haus 0x000000010622bd9c _ZN17GrMtlRenderTargetC1EP8GrMtlGpu7SkISize5sk_spI15GrMtlAttachmentES5_NS_7WrappedENSt3__117basic_string_viewIcNS7_11char_traitsIcEEEE + 288
6 DRiP Haus 0x000000010622c0f4 _ZN17GrMtlRenderTarget23MakeWrappedRenderTargetEP8GrMtlGpu7SkISizeiPU21objcproto10MTLTexture11objc_object + 468
7 DRiP Haus 0x0000000106222424 _ZN8GrMtlGpu25onWrapBackendRenderTargetERK21GrBackendRenderTarget + 204
8 DRiP Haus 0x0000000106139238 _ZN5GrGpu23wrapBackendRenderTargetERK21GrBackendRenderTarget + 156
9 DRiP Haus 0x000000010614068c _ZN15GrProxyProvider23wrapBackendRenderTargetERK21GrBackendRenderTarget5sk_spIN5skgpu16RefCntedCallbackEE + 92
10 DRiP Haus 0x00000001061e4a40 _ZN10SkSurfaces23WrapBackendRenderTargetEP18GrRecordingContextRK21GrBackendRenderTarget15GrSurfaceOrigin11SkColorType5sk_spI12SkColorSpaceEPK14SkSurfacePropsPFvPvESD_ + 484
11 DRiP Haus 0x000000010653e4a0 org_jetbrains_skia_Surface__1nMakeFromBackendRenderTarget + 124
12 DRiP Haus 0x0000000105ed55e0 kfun:org.jetbrains.skia.Surface.Companion#makeFromBackendRenderTarget(org.jetbrains.skia.DirectContext;org.jetbrains.skia.BackendRenderTarget;org.jetbrains.skia.SurfaceOrigin;org.jetbrains.skia.SurfaceColorFormat;org.jetbrains.skia.ColorSpace?;org.jetbrains.skia.SurfaceProps?){}org.jetbrains.skia.Surface? + 756
13 DRiP Haus 0x000000010697c568 kfun:androidx.compose.ui.window.MetalRedrawer.draw#internal + 7044
14 DRiP Haus 0x000000010697e994 kfun:androidx.compose.ui.window.MetalRedrawer.<init>$lambda$1#internal + 472
15 DRiP Haus 0x0000000106980d8c kfun:androidx.compose.ui.window.MetalRedrawer.$<init>$lambda$1$FUNCTION_REFERENCE$1.invoke#internal + 72
16 DRiP Haus 0x0000000106980e5c kfun:androidx.compose.ui.window.MetalRedrawer.$<init>$lambda$1$FUNCTION_REFERENCE$1.$<bridge-DNN>invoke(){}#internal + 72
17 DRiP Haus 0x0000000105b4c710 kfun:kotlin.Function0#invoke(){}1:0-trampoline + 100
18 DRiP Haus 0x0000000106981d34 kfun:androidx.compose.ui.window.DisplayLinkProxy.handleDisplayLinkTick#internal + 152
19 DRiP Haus 0x0000000106981de8 kfun:androidx.compose.ui.window.DisplayLinkProxy.$imp:handleDisplayLinkTick#internal + 144
20 QuartzCore 0x0000000189dc34ac _ZN2CA7Display15DisplayLinkItem9dispatch_ERNS_8SignPost8IntervalILNS2_11CAEventCodeE835322056EEE + 44
21 QuartzCore 0x0000000189dc46d4 _ZN2CA7Display11DisplayLink14dispatch_itemsEyyy + 804
22 QuartzCore 0x0000000189dc42a0 _ZN2CA7Display11DisplayLink8callbackEP15_CADisplayTimeryyybPv + 632
23 QuartzCore 0x0000000189ecab9c _ZL22display_timer_callbackP12__CFMachPortPvlS1_ + 336
24 CoreFoundation 0x00000001803b84e4 __CFMachPortPerform + 172
25 CoreFoundation 0x00000001803eeddc __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 56
26 CoreFoundation 0x00000001803ee3ac __CFRunLoopDoSource1 + 496
27 CoreFoundation 0x00000001803e89bc __CFRunLoopRun + 2176
28 CoreFoundation 0x00000001803e7d28 CFRunLoopRunSpecific + 572
29 GraphicsServices 0x000000018e7cdbc0 GSEventRunModal + 160
30 UIKitCore 0x00000001852bafdc -[UIApplication _run] + 868
31 UIKitCore 0x00000001852bec54 UIApplicationMain + 124
32 SwiftUI 0x00000001c4b04524 OUTLINED_FUNCTION_70 + 500
33 SwiftUI 0x00000001c4b043c4 OUTLINED_FUNCTION_70 + 148
34 SwiftUI 0x00000001c4816108 OUTLINED_FUNCTION_2 + 92
35 DRiP Haus 0x00000001047a5e80 $s9DRiP_Haus6iOSAppV5$mainyyFZ + 40
36 DRiP Haus 0x00000001047a5f30 main + 12
Thread running at User-interactive quality-of-service class waiting on a lower QoS thread running at Default quality-of-service class. Investigate ways to avoid priority inversions
We've heard that mutex implementations like our one can suffer from QoS problems (Kotlin/kotlinx-atomicfu#462), and this looks like solid proof. Thanks for the codebase! I'm sure it will be invaluable in our investigation.