Deadlock in v5.4.5
danwood opened this issue · 4 comments
Is this a support request?
no, it's a bug report
Describe the bug
We got a deadlock between the main thread waiting for the com.launchdarkly.DiagnosticCache.cacheQueue thread, and that thread waiting for the main thread.
To reproduce
Not reproducible :-(
Expected behavior
We should not have a deadlock :-)
Logs
Main Thread: (excerpted, this is the part called from our app's code)
1000 LDClient.variationInternal<A>(forKey:defaultValue:includeReason:) + 2736 (LaunchDarkly + 303776) [0x1057e62a0]
1000 EventReporter.recordFlagEvaluationEvents(flagKey:value:defaultValue:featureFlag:user:includeReason:) + 992 (LaunchDarkly + 186444) [0x1057c984c]
1000 _dispatch_sync_f_slow + 144 (libdispatch.dylib + 76944) [0x185999c90]
1000 __DISPATCH_WAIT_FOR_QUEUE__ + 336 (libdispatch.dylib + 78028) [0x18599a0cc]
1000 _dispatch_event_loop_wait_for_ownership + 444 (libdispatch.dylib + 161256) [0x1859ae5e8]
1000 kevent_id + 8 (libsystem_kernel.dylib + 13960) [0x185b0f688]
*1000 ??? (kernel.release.t6000 + 5642160) [0xfffffe0007e757b0] (blocked by turnstile waiting for #####REDACTED##### [65883] [unique pid 165569] thread 0x27f1bc)
Thread 2xf1bc:
Thread 0x27f1bc DispatchQueue "com.launchdarkly.DiagnosticCache.cacheQueue"(562) 1000 samples (1-1000) priority 46 (base 46)
1000 start_wqthread + 8 (libsystem_pthread.dylib + 8900) [0x185b442c4]
1000 _pthread_wqthread + 288 (libsystem_pthread.dylib + 13744) [0x185b455b0]
1000 _dispatch_workloop_worker_thread + 656 (libdispatch.dylib + 91912) [0x18599d708]
1000 _dispatch_lane_invoke + 392 (libdispatch.dylib + 48804) [0x185992ea4]
1000 _dispatch_lane_serial_drain + 672 (libdispatch.dylib + 45872) [0x185992330]
1000 _dispatch_client_callout + 20 (libdispatch.dylib + 15276) [0x18598abac]
1000 _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7776) [0x185988e60]
1000 thunk for @escaping @callee_guaranteed () -> () + 28 (LaunchDarkly + 384280) [0x1057f9d18]
1000 closure #1 in LDTimer.timerFired() + 128 (LaunchDarkly + 385392) [0x1057fa170]
1000 EventReporter.reportEvents(completion:) + 1220 (LaunchDarkly + 188704) [0x1057ca120]
1000 _dispatch_lane_barrier_sync_invoke_and_complete + 56 (libdispatch.dylib + 77312) [0x185999e00]
1000 _dispatch_client_callout + 20 (libdispatch.dylib + 15276) [0x18598abac]
1000 thunk for @escaping @callee_guaranteed () -> () + 20 (LaunchDarkly + 285612) [0x1057e1bac]
1000 thunk for @callee_guaranteed () -> () + 20 (LaunchDarkly + 285580) [0x1057e1b8c]
1000 closure #1 in DiagnosticCache.updateStoredDataSync(updateFunc:) + 168 (LaunchDarkly + 106644) [0x1057b6094]
1000 StoreData.save(_:) + 252 (LaunchDarkly + 105008) [0x1057b5a30]
1000 -[NSNotificationCenter postNotificationName:object:userInfo:] + 96 (Foundation + 42464) [0x186a915e0]
1000 _CFXNotificationPost + 800 (CoreFoundation + 294496) [0x185bd7e60]
1000 -[NSOperation waitUntilFinished] + 584 (Foundation + 349440) [0x186adc500]
1000 __psynch_cvwait + 8 (libsystem_kernel.dylib + 20672) [0x185b110c0]
*1000 psynch_cvcontinue + 0 (pthread + 18008) [0xfffffe000a508a18]
SDK version
5.4.5
Language version, developer tools
Xcode 13.4.1
OS/platform
Mac OS Monterey 12.4
Additional context
Add any other context about the problem here.
Thank you for the bug report. I haven't had a chance to dig into this yet but I didn't want to leave you over the weekend without some acknowledgement.
I see you are using v5.4.5. Are you in a position to upgrade to the v6 SDK?
Haven't tried that yet … the strange thing is that we are only seeing these issues recently and we've been on 5.4.5 since the end of April. We can investigate moving up to v6 but that's a bigger task than we were hoping to just avoid some deadlocks …
From what I can see, we're trying to access the flag's current value which should be an immediate-return operation. I can see how there would be an async operation to save to the web for statistical purposes that the flag was evaluated. But why would the main thread be waiting for this operation to complete before it could return?
gonna withdraw our report. I wasn't the one to fix it but there was something we were doing in our code that interacted poorly here so this happened.