DISTRHO/JUCE

X-run and scheduling problem in multi-thread plugins

Closed this issue · 4 comments

When I run a plugin which does a considerable amount of work in background threads, I have observed that many xruns are happening.

I believe the reason is bad handling of threading at Juce side.
By a very simple fix I have been able to elimininate the xrun problems, I'd like to discuss the validity of the solution.
https://github.com/DISTRHO/juce/blob/e17dde701676585f2f5f67cb6f80a77ae79bf095/modules/juce_core/native/juce_posix_SharedCode.h#L1046
Change to : policy = priority < 9 ? SCHED_OTHER : SCHED_RR;

Explanation

When an object Thread is instanciated, it is assigned a priority 5, which Juce considers the normal priority in a value domain 0-10.
https://github.com/DISTRHO/juce/blob/e17dde701676585f2f5f67cb6f80a77ae79bf095/modules/juce_core/threads/juce_Thread.h#L311

As seen in original code above, the priority 5, as well as any other != 0, selects RR realtime scheduler policy. As such, any ordinary thread which will be instanciated, including Juce's own, are going to be treated like Rt tasks. As I understand it, these tasks enter in scheduling concurrency with audio tasks, which explains the xruns.

The reason for which I picked a value threshold at 9 for RR is because it's what Juce used for Rt threads as indicated at this location.
https://github.com/DISTRHO/juce/blob/e17dde701676585f2f5f67cb6f80a77ae79bf095/modules/juce_core/threads/juce_Thread.cpp#L144-L145

One thing remaining unclear is why at my side the problem occurred only when the LV2 ran in a host, and never when it was run as standalone.
My testing was on a personal branch with an updated Juce, but as I see this version seems to suffer of identical implementation problems.

Addendum: the user @sub26nico has confirmed the xrun problem as related to UI being shown and active.
He confirmed the patch which disables SCHED_RR on ordinary pthreads resolved the problem entirely.

This is not only a fact of my own plugins; Tunefish was reported to me as having an identical issue:
"tunefish blows up cpu usage and xruns on opening the UI! dsp at 100% and cascaded xruns"
I implemented LV2 support with help of DISTRHO/JUCE: https://github.com/jpcima/tunefish/tree/lv2

I believe it's the UIs which have dynamic drawn displays while playing which are affected by the problem the most.

In this flyspray issue 🇫🇷, additional detail has been given. translation 🇺🇸

About the patch: it has one downside, it's that juce thread priorities will lose their intended effect.
At SCHED_OTHER, the min-max of pthread priority obtained is 0 range. It's not a very satisfying outcome, yet much preferable alternative to xrun.

Thanks for the heads up on this.
I agree it is not the best outcome, but better than xruns for sure.
I will apply your patch.

Handled in 08c983c
Thanks again.

Thanks. In fact in the meantime, we have discovered some new information about this problem.
(thanks to @trebmuh, @sub26nico)

We know Juce sets some fixed RR priorities on its own threads, of which the highest non-audio one must be the message thread I believe, to affect the reactivity of user interface. (I'm sure Jules once made a mention of this on JUCE forums)

In the external problem report by @sub26nico, the test which provoked xruns was performed under a realtime priority setting of Jack = 70. With regards to this settings, the Jack clients will obtain a priority a bit under this defined value.

If my memory serves me right, this setup made audio processing run at rtprio 65. Meanwhile, Juce always runs its highest thread at the fixed priority 69.

This means that there is a threshold of Jack's setting where Juce will be in priority on the scheduler.
As we observed, at 80+ rtprio in Jack, then JUCE would run below.
It's possible that xruns would not happen in this setup, but I would have to redo the experiment to confirm.
ADDENDUM @sub26nico confirmed the problem to be happening at rtprio 70, and not longer at rtprio 80

About my synth project, I received a xrun report from a user who ran Jack at rtprio 10 (client priority 5).
I expect numerous users to experience the issue, 10 being the default setting of Jack-Dbus, very far of the recommandable 80 priority value.

Myself, I doubt whether the idea of running non-audio computation at realtime priority is good in any circumstance; but I'm surely not knowledgeable enough on Linux scheduler to judge of this.