mtytel/vital

loading LV2 plugin freezes hosts

Be-ing opened this issue · 29 comments

Loading the LV2 plugin with either Ardour or Zrythm freezes the host. Carla does not detect the plugin.

I can open in jalv.qt5, but generic GUI, also jalv.gtk3 loads in JACK but no GUI...

REIS0 commented

In my case carla opens the plugin but crashes when opening the GUI

I'm also having this problem, but I'm on a weird architecture, so I can't be sure that any of my problems are also experienced on x86. In my case, I can run the standalone version fine, but when I load the LV2 plugin, there's a deadlock in the InternalRunLoop class in third_party/JUCE/modules/juce_events/native/juce_linux_Messaging.cpp. I'm still investigating exactly why it's happening.

If people on x86 experiencing these freezes could check if the cause is the same, that would be helpful. In my case, I ran Ardour in GDB with ardour -g, loaded the LV2 version of Vital I just compiled, and after it froze Ardour, I pressed CTRL+C and printed a backtrace with bt, and it was clear that the code was stuck waiting for a lock in InternalRunLoop::registerFdCallback().

bepzi commented

If people on x86 experiencing these freezes could check if the cause is the same, that would be helpful.

Yep. Here's the (truncated) backtrace on x86_64:

(gdb) bt
#0  0x00007ffff56fbe7e in __pthread_mutex_lock_full () at /usr/lib/libpthread.so.0
#1  0x00007fff5b37f28e in juce::InternalRunLoop::registerFdCallback(int, std::function<void (int)>&&, short) () at /usr/lib/lv2/Vital.lv2/Vital.so
#2  0x00007fff5b50e309 in juce::XWindowSystem::initialiseXDisplay() () at /usr/lib/lv2/Vital.lv2/Vital.so
#3  0x00007fff5b8ebe09 in juce::SingletonHolder<juce::XWindowSystem, juce::CriticalSection, false>::get() [clone .constprop.0] ()
    at /usr/lib/lv2/Vital.lv2/Vital.so
#4  0x00007fff5b458b57 in juce::Desktop::Desktop() () at /usr/lib/lv2/Vital.lv2/Vital.so
#5  0x00007fff5b459985 in juce::Desktop::getInstance() () at /usr/lib/lv2/Vital.lv2/Vital.so
#6  0x00007fff5b4970cc in juce::LookAndFeel::getDefaultLookAndFeel() () at /usr/lib/lv2/Vital.lv2/Vital.so
#7  0x00007fff5b4db02e in juce::Slider::Slider(juce::String const&) () at /usr/lib/lv2/Vital.lv2/Vital.so
#8  0x00007fff5b7ebf8c in SynthSlider::SynthSlider(juce::String) () at /usr/lib/lv2/Vital.lv2/Vital.so
#9  0x00007fff5b62744f in std::_MakeUniq<SynthSlider>::__single_object std::make_unique<SynthSlider, std::string>(std::string&&) ()
    at /usr/lib/lv2/Vital.lv2/Vital.so
#10 0x00007fff5b5cea0d in FilterSection::FilterSection(juce::String, juce::String) () at /usr/lib/lv2/Vital.lv2/Vital.so
#11 0x00007fff5b5d976f in FilterSection::FilterSection(int, std::map<std::string, vital::Output*, std::less<std::string>, std::allocator<std::pair<std::string const, vital::Output*> > > const&, std::map<std::string, vital::Output*, std::less<std::string>, std::allocator<std::pair<std::string const, vital::Output*> > > const&) () at /usr/lib/lv2/Vital.lv2/Vital.so
#12 0x00007fff5b5ccc27 in SynthesisInterface::SynthesisInterface(Authentication*, std::map<std::string, vital::Output*, std::less<std::string>, std::allocator<std::pair<std::string const, vital::Output*> > > const&, std::map<std::string, vital::Output*, std::less<std::string>, std::allocator<std::pair<std::string const, vital::Output*> > > const&) () at /usr/lib/lv2/Vital.lv2/Vital.so
#13 0x00007fff5b60f629 in FullInterface::FullInterface(SynthGuiData*) () at /usr/lib/lv2/Vital.lv2/Vital.so
#14 0x00007fff5b8cad28 in SynthGuiInterface::SynthGuiInterface(SynthBase*, bool) [clone .constprop.0] () at /usr/lib/lv2/Vital.lv2/Vital.so
#15 0x00007fff5b25c0a0 in SynthEditor::SynthEditor(SynthPlugin&) () at /usr/lib/lv2/Vital.lv2/Vital.so
#16 0x00007fff5b25d0ff in juceLV2UI_Instantiate(void (*)(void*, unsigned int, unsigned int, unsigned int, void const*), void*, void**, _LV2_Feature const* const*, bool) () at /usr/lib/lv2/Vital.lv2/Vital.so
#17 0x00007ffff5290b52 in suil_instance_new () at /usr/lib/libsuil-0.so.0
# ... GTK stuff omitted
#48 0x0000555555a08aba in main ()

I believe I've fixed this issue in my fork of Vital, along with a number of other fixes. See this commit specifically for details about what was causing the issue and for the one-line fix.

bepzi commented

Just to chime in, the fix in @taylordotfish's repo does seem to work. I'm able to build and run the LV2 version in Ardour with no hiccups (yet) :)

You probably don't want to undo the lock there since I think that could to things that require it. Probably can just remove the lock from the getUI function in third_party/JUCE/modules/juce_audio_plugin_client/LV2/juce_LV2_Wrapper.cpp
That's what I've done to fix that in the next version. (source will get updated at some point)

@mtytel Yeah, I was a bit unsure about undoing the lock there, but I'm also concerned about removing the lock in getUI(), since JUCE's documentation says “it's not thread-safe to call any of the Component or other UI classes from threads other than the message thread” without locking it, and in the plugin hosts I've tried, getUI() definitely gets called from a thread other than the message thread.

I think the ScopedUnlock in dispatchPendingEvents() is correct, even though it's confusing—the only function that calls dispatchPendingEvents() is LinuxEventLoop::dispatchNextMessageOnSystemQueue(), which is labeled with the comment // this function expects that it will NEVER be called simultaneously for two concurrent threads, and my cursory overview of the rest of JUCE seems to confirm that (and it makes sense, given that messages should only ever be handled on the message thread). shouldDeferModifyingReadCallbacks gets set to true before unlocking, so calls to registerFdCallback() or unregisterFdCallback() won't invalidate the data dispatchPendingEvents() is using, and concurrent calls to (un)registerFdCallback() won't race with each other since they always obtain the lock first.

I definitely think there's a bug in the upstream JUCE implementation—without unlocking there, shouldDeferModifyingReadCallbacks is completely useless: the only code that changes its behavior based on that value is registerFdCallback() and unregisterFdCallback(), but both of those functions obtain the lock before even checking the value of that variable. But by the time they're able to obtain the lock, dispatchPendingEvents() has necessarily returned and reset the value of shouldDeferModifyingReadCallbacks to false. So it's always false when (un)registerFdCallback() runs.

But I'm not particularly familiar with JUCE and may very well be wrong, so if anyone has an explanation of a data race that could occur due to unlocking there, please let me know.

Here's an alternative fix for the deadlock. It's a larger change, but I think it’s easier to reason about the correctness of this code.

With this change, registerFdCallback() and unregisterFdCallback() never modify the callback/file descriptor vectors directly; instead, they push their changes onto the pendingAdditions and pendingRemovals vectors, and dispatchPendingEvents() is responsible for applying those pending changes. Access to pendingAdditions and pendingRemovals is guarded with a separate lock, pendingChangesLock, which, unlike lock, isn't held during the entirety of dispatchPendingEvents()'s execution.

I guess my hesitancy is editing more central JUCE code when it should be more on the LV2 wrapping side.
You could be right about the getUI thing, I didn't check if it was called on the message thread yet, might have made a bad assumption.

The core code works for the VST/VST3/standalone and used to work for LV2 so I'm assuming the VST wrappers were updated and LV2 needs some changes (since I didn't touch much when upgrading juce)

So ideally a fix would just have changes to the LV2 wrapper but I haven't dug deep into this yet.

Yeah, I would also prefer a solution that doesn't involve modifying JUCE's internals. The fact that there's even a deadlock at all is actually itself an indication that getUI() gets called from a non-message thread, as all CriticalSection locks are actually recursive mutexes, so you wouldn't see a deadlock unless multiple threads are involved.

jypma commented

Wanting to try out this build on arch linux, I created a little PKGBUILD to help me install it. The LV2 version of the this fork loads fine in Ardour :-) Here's the PKGBUILD (small adjustments from the binary one already in AUR:

pkgname=vital-synth-lv2-git
pkgver=r32.ebeeaab
pkgver() {
  cd vital
  printf "r%s.%s" "$(git rev-list --count HEAD)" "$(git rev-parse --short HEAD)"
}
pkgrel=1
pkgdesc="Powerful wavetable synthesizer with realtime modulation feedback. Vital is a MIDI enabled polyphonic music synthesizer with an easy to use parameter modulation system with real-time graphical feedback."
arch=('x86_64')
url="https://vital.audio"
license=('custom')
depends=('alsa-lib>=1.0.16' 'freetype2>=2.2.1' 'gcc-libs' 'gcc>=3.3.1' 'glib2>=2.12.0' 'glibc>=2.17' 'libcurl-gnutls>=7.16.2' 'libgl' 'libglvnd' 'libsecret>=0.7')
source_x86_64=("git+https://github.com/taylordotfish/vital.git")
sha512sums_x86_64=('SKIP')

build() {
    cd vital
    make lv2
}

package() {
    cd vital
    make install_lv2 PREFIX="${pkgdir}/usr"
}

Hope that it helps others wanting to try out and/or contribute to this great synth :-)

Here's an alternative fix for the deadlock. It's a larger change, but I think it’s easier to reason about the correctness of this code.

With this change, registerFdCallback() and unregisterFdCallback() never modify the callback/file descriptor vectors directly; instead, they push their changes onto the pendingAdditions and pendingRemovals vectors, and dispatchPendingEvents() is responsible for applying those pending changes. Access to pendingAdditions and pendingRemovals is guarded with a separate lock, pendingChangesLock, which, unlike lock, isn't held during the entirety of dispatchPendingEvents()'s execution.

In jalv.gtk3 this crashes as soon as I try interacting with the UI:

[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
jalv.gtk3: ../../src/xcb_io.c:263: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted

@dromer Hm, I can't reproduce this—if you build the commit right before that (c85a58c91c8f118cbd10a88e17f5ebc8c16caa12), do you still get that error?

Nope, that commit also freezes/crashes the UI. Occasionally I can interact with it briefly, but then it crashes again.

Btw HEAD does work fine in Carla as far as I can tell

@taylordotfish latest HEAD (ebeeaab) through gdb:

[New Thread 0x7fffe91ce700 (LWP 13810)]
[New Thread 0x7fffe3b7b700 (LWP 13811)]
[New Thread 0x7fffe2b77700 (LWP 13812)]
[New Thread 0x7fffd3fff700 (LWP 13813)]
[New Thread 0x7fffd37fe700 (LWP 13814)]
[New Thread 0x7fffd0bb7700 (LWP 13815)]
[xcb] Unknown sequence number while processing reply
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
jalv.gtk3: ../../src/xcb_io.c:643: _XReply: Assertion `!xcb_xlib_threads_sequence_lost' failed.

Thread 4 "jalv.gtk3" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe9a50700 (LWP 13808)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6f497bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff6f34535 in __GI_abort () at abort.c:79
#2  0x00007ffff6f3440f in __assert_fail_base
    (fmt=0x7ffff7096ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff695c808 "!xcb_xlib_threads_sequence_lost", file=0x7ffff695c673 "../../src/xcb_io.c", line=643, function=<optimized out>) at assert.c:92
#3  0x00007ffff6f42102 in __GI___assert_fail (assertion=0x7ffff695c808 "!xcb_xlib_threads_sequence_lost", file=0x7ffff695c673 "../../src/xcb_io.c", line=643, function=0x7ffff695ca88 "_XReply") at assert.c:101
#4  0x00007ffff68eb1be in _XReply () at /lib/x86_64-linux-gnu/libX11.so.6
#5  0x00007ffff68e110e in XQueryPointer () at /lib/x86_64-linux-gnu/libX11.so.6
#6  0x00007fffeac4c57d in juce::MouseInputSource::getCurrentRawMousePosition() () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#7  0x00007fffeacd9485 in juce::Desktop::resetTimer() () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#8  0x00007fffead20369 in juce::MouseInputSourceInternal::setScreenPos(juce::Point<float>, juce::Time, bool) () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#9  0x00007fffeac55d42 in juce::WindowingHelpers::windowMessageReceive(_XEvent&) [clone .lto_priv.4242] () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#10 0x00007fffeac3a867 in std::_Function_handler<void (int), juce::XWindowSystem::initialiseXDisplay()::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&) () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#11 0x00007fffeae8282c in SharedMessageThread::run() () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#12 0x00007fffeadc353a in juce::threadEntryProc(void*) [clone .lto_priv.6889] () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#13 0x00007ffff7f52fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#14 0x00007ffff700b4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@mtytel I came up with a workaround for the deadlock that doesn't require changes to any part of JUCE besides the LV2 wrapper. The change is simply to instantiate the UI on the message thread itself.

(I do still think there's a bug in juce_linux_Messaging.cpp, but the right way of getting that fixed would really be to report it to the JUCE developers. This workaround seems to work in at least Ardour, Carla, and jalv.gtk3.)

@dromer Could you try out this change? I'm wondering if this will also fix the issue you're experiencing, as it changes which thread initialiseXDisplay() gets run on.

@taylordotfish ok! fingers crossed it doesn't take down my entire desktop this time (yes that happened on the other test you had me do :P)

[New Thread 0x7fffe91ce700 (LWP 16346)]
[New Thread 0x7fffe3b7b700 (LWP 16347)]
[New Thread 0x7fffe2b77700 (LWP 16348)]
[New Thread 0x7fffd3fff700 (LWP 16349)]
[New Thread 0x7fffd37fe700 (LWP 16350)]
[xcb] Unknown sequence number while processing reply
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
jalv.gtk3: ../../src/xcb_io.c:643: _XReply: Assertion `!xcb_xlib_threads_sequence_lost' failed.

Thread 9 "Pool" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffd3fff700 (LWP 16349)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6f497bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff6f34535 in __GI_abort () at abort.c:79
#2  0x00007ffff6f3440f in __assert_fail_base
    (fmt=0x7ffff7096ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff695c808 "!xcb_xlib_threads_sequence_lost", file=0x7ffff695c673 "../../src/xcb_io.c", line=643, function=<optimized out>) at assert.c:92
#3  0x00007ffff6f42102 in __GI___assert_fail
    (assertion=0x7ffff695c808 "!xcb_xlib_threads_sequence_lost", file=0x7ffff695c673 "../../src/xcb_io.c", line=643, function=0x7ffff695ca88 "_XReply") at assert.c:101
#4  0x00007ffff68eb1be in _XReply () at /lib/x86_64-linux-gnu/libX11.so.6
#5  0x00007fffe8119cb6 in  () at /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#6  0x00007fffe80cca8c in  () at /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#7  0x00007fffe80ccba8 in  () at /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#8  0x00007fffe80f48e9 in  () at /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#9  0x00007fffe80eaa17 in glXCreateContextAttribsARB () at /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#10 0x00007fffea0b905c in  () at /lib/x86_64-linux-gnu/libGLX.so.0
#11 0x00007fffeaa4393a in juce::OpenGLContext::CachedImage::runJob() [clone .lto_priv.6227] () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#12 0x00007fffeadd1f1f in juce::ThreadPool::ThreadPoolThread::run() () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#13 0x00007fffeadc337a in juce::threadEntryProc(void*) [clone .lto_priv.6889] () at /home/dreamer/.lv2/Vial.lv2/Vial.so
#14 0x00007ffff7f52fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#15 0x00007ffff700b4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@dromer Shoot. I wish I could investigate this more, but it's difficult as I can't reproduce it, and I don't have much spare time in the first place. I'll gladly accept a PR in my fork if someone manages to fix it, though.

Top, no biggie for me. I'm already happy that the .lv2 works with Carla :)

FYI I just updated DISTRHO/JUCE to latest upstream JUCE, rebasing all patches.
The lockup when showing LV2 UI was a very easy fix, as some things in JUCE now require X11 even before an UI is even visible on the screen. Fixed up in DISTRHO/JUCE@a385a08

The hero @falkTX arrives 🎉

@falkTX I've been trying to build Vital with the newest version of DISTRHO/JUCE, and the UI shows without locking up, but in Ardour and jalv.gtk3, I get a segfault when I close the UI window:

#0  0x00003ffff6b46004 in __dynamic_cast () at /usr/lib/powerpc64le-linux-gnu/libstdc++.so.6
#1  0x00003fffe8a138f8 in juce::XWindowSystem::windowMessageReceive(_XEvent&) (event=...) at ../../../third_party/JUCE/modules/juce_gui_basics/native/x11/juce_linux_XWindowSystem.cpp:3680
#2  0x00003fffe8a0ff24 in operator()(int) const (__closure=0x3fffe0000c80) at ../../../third_party/JUCE/modules/juce_gui_basics/native/x11/juce_linux_XWindowSystem.cpp:3046
#3  0x00003fffe8a2c7b8 in std::__invoke_impl<void, juce::XWindowSystem::initialiseXDisplay()::<lambda(int)>&, int>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#4  0x00003fffe8a24c5c in std::__invoke_r<void, juce::XWindowSystem::initialiseXDisplay()::<lambda(int)>&, int>(struct {...} &) (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
#5  0x00003fffe8a1e4fc in std::_Function_handler<void(int), juce::XWindowSystem::initialiseXDisplay()::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&) (__functor=..., __args#0=@0x3fffe71ad168: 21)
    at /usr/include/c++/10/bits/std_function.h:291
#6  0x00003fffe879fcf4 in std::function<void (int)>::operator()(int) const (this=0x3fffe0000c80, __args#0=21) at /usr/include/c++/10/bits/std_function.h:622
#7  0x00003fffe8796ae4 in juce::InternalRunLoop::dispatchPendingEvents() (this=0x3fffe0000bd0) at ../../../third_party/JUCE/modules/juce_events/native/juce_linux_Messaging.cpp:186
#8  0x00003fffe878dcdc in juce::MessageManager::dispatchNextMessageOnSystemQueue(bool) (returnIfNoPendingMessages=false) at ../../../third_party/JUCE/modules/juce_events/native/juce_linux_Messaging.cpp:300
#9  0x00003fffe87856dc in juce::MessageManager::runDispatchLoop() (this=0x3fffe0000ba0) at ../../../third_party/JUCE/modules/juce_events/messages/juce_MessageManager.cpp:128
#10 0x00003fffe78c824c in SharedMessageThread::run() (this=0x100173b20) at ../../../third_party/JUCE/modules/juce_audio_plugin_client/LV2/juce_LV2_Wrapper.cpp:160
#11 0x00003fffe85d74dc in juce::Thread::threadEntryPoint() (this=0x100173b20) at ../../../third_party/JUCE/modules/juce_core/threads/juce_Thread.cpp:96
#12 0x00003fffe85d7624 in juce::juce_threadEntryPoint(void*) (userData=0x100173b20) at ../../../third_party/JUCE/modules/juce_core/threads/juce_Thread.cpp:118
#13 0x00003fffe8608038 in juce::threadEntryProc(void*) (userData=0x100173b20) at ../../../third_party/JUCE/modules/juce_core/native/juce_posix_SharedCode.h:877
#14 0x00003ffff7f2856c in start_thread (arg=0x3fffe71ade10) at pthread_create.c:477

I didnt get that far yet. Was updating juce as first step.
I suspect this simply needs a messagemanager lock on the lv2 ui hide/cleanup function.

I have Vital as Vitalium in https://github.com/DISTRHO/DISTRHO-Ports now.
But still needs tweaks for the branding and general testing.

oh, for building, you might want to use

meson build --buildtype release -Dplugins=vitalium

so it skips all other plugins except vitalium

The crash on close has been "fixed" on DISTRHO/DISTRHO-Ports@8f7eed6
I do not question that there is a better solution out there, but juce grew too much that is hard to understand the entire backend now. I am personally okay with this workaround for now.

@taylordotfish In your fork, when editing wavetables, the "Remove Group" button does not seem to remove individual sources, meaning that one cannot properly switch out a wave Source with, e.g, a line source.

Either that, or it's just an issue with running Vial as an LV2. I can't tell, since running the original just freezes Ardour whenever I try to open up the GUI as per this issue.