EmbarkStudios/puffin

puffin_egui deadlock if new_frame is called from another thread

lunixbochs opened this issue · 1 comments

I'm running into a deadlock in puffin_egui if new_frame() is called from a thread other than the UI thread.
I built a sampling profiler that runs in a single thread, manually creates ThreadProfiler instances for all threads running in the process, and samples their stack traces at intervals, so I'm calling new_frame from that thread rather than my UI thread.

Here puffin takes the GlobalFrameView lock and holds it across the UI function call:

pub fn window(&mut self, ctx: &egui::Context) -> bool {
let mut frame_view = self.global_frame_view.lock();
self.profiler_ui
.window(ctx, &mut MaybeMutRef::MutRef(&mut frame_view))
}
/// Show the profiler.
///
/// Call this from within an [`egui::Window`], or use [`Self::window`] instead.
pub fn ui(&mut self, ui: &mut egui::Ui) {
let mut frame_view = self.global_frame_view.lock();
self.profiler_ui
.ui(ui, &mut MaybeMutRef::MutRef(&mut frame_view));
}

The profiler ui call itself contains profiling calls, which calls ThreadProfiler::end_scope internally, which tries to take the GlobalProfiler lock. So you have a nested GlobalFrameView lock -> GlobalProfiler lock acquisition in the UI.

GlobalProfiler::lock().new_frame() has the GlobalProfiler locked and calls the sink function here, which tries to lock the GlobalFrameView:

view_clone.lock().add_frame(frame);

This mismatched lock order triggers a deadlock.

This seems like an interesting issue (also your sampling profiler sounds very cool).

I'm a bit confused on the exact order of these locks and accesses - I'm not familiar with puffin internals - could you possibly explain it again if my explanation is wrong? It seems like we have one thread doing GlobalFrameView::lock() -> GlobalProfiler::lock(), with another doing the reverse order GlobalProfiler::lock() -> GlobalFrameView::lock().

I assume that using a re-entrant Mutex would not fix this problem, because they are executing on different thread. Perhaps we could try using a proxy GlobalProfiler instance, so that the GlobalProfiler::SINGLETON writes to the profiler UI's proxy profiler, and the profiler UI only ever reads the proxy profiler?

A better alternative might be to try to lock the profiler, read all the data from it, and then unlock and call the profiler UI using the data we read, instead of accessing the GlobalProfiler. This would mean the reading of data couldn't be orofiler, but would allow the profiler UI to contain profiling calls as usual.