WebKitGTK-2.32.0 WebProcess crashes with WPEBackend-fdo-1.8.3 and 1.9.90
xry111 opened this issue · 33 comments
With WPEBackend-fdo-1.8.3 or 1.9.90, WebKitGTK-2.32.0 WebProcess crashes when the page is not rendered completely and I go to another URL (click a link or "go back" button, or type an URL and press Enter).
Downgrading to WPEBackend-fdo-1.8.0 fixes the issue, so I think it may be a WPEBackend-fdo issue.
A stack backtrace:
#0 wl_proxy_marshal_constructor
(proxy=0x0, opcode=0, interface=0x7fdf98f17b20 <wl_surface_interface>)
at src/wayland-client.c:829
#1 0x00007fdf9eced8f4 in wl_compositor_create_surface(wl_compositor*)
(wl_compositor=0x0) at /usr/include/wayland-client-protocol.h:1281
#2 0x00007fdf9ecee261 in WS::BaseTarget::initialize(WS::BaseBackend&) (this=
0x7fdebc0016e8, backend=...) at ../src/ws-client.cpp:207
#3 0x00007fdf9ece3f9f in (anonymous namespace)::Target::initialize((anonymous namespace)::Backend&, uint32_t, uint32_t) (this=
0x7fdebc0016e0, backend=..., width=3840, height=1920)
at ../src/renderer-backend-egl.cpp:76
#4 0x00007fdf9ece42f5 in fdo_renderer_backend_egl_target::{lambda(void*, fdo_renderer_backend_egl_target, unsigned int, unsigned int)#3}::operator()(fdo_renderer_backend_egl_target, fdo_renderer_backend_egl_target, unsigned int, unsigned int) const
(__closure=0x0, data=0x7fdebc0016e0, backend_data=0x175b830, width=3840, height=1920) at ../src/renderer-backend-egl.cpp:147
#5 0x00007fdf9ece432c in fdo_renderer_backend_egl_target::{lambda(void*, fdo_renderer_backend_egl_target, unsigned int, unsigned int)#3}::_FUN(fdo_renderer_backend_egl_target, fdo_renderer_backend_egl_target, unsigned int, unsigned int)
() at ../src/renderer-backend-egl.cpp:148
#6 0x00007fdf9f57188b in WTF::Detail::CallableWrapper<WebKit::ThreadedCompositor::ThreadedCompositor(WebKit::ThreadedCompositor::Client&, WebKit::ThreadedDispl--Type <RET> for more, q to quit, c to continue without paging--
ayRefreshMonitor::Client&, unsigned int, WebCore::IntSize const&, float, unsigned int)::{lambda()#2}, void>::call() [clone .lto_priv.0] ()
at /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#7 0x00007fdf9f565713 in WTF::Detail::CallableWrapper<WebKit::CompositingRunLoop::performTaskSync(WTF::Function<void ()>&&)::{lambda()#1}, void>::call() [clone .lto_priv.0] () at /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#8 0x00007fdf9db32dc6 in WTF::RunLoop::performWork() ()
at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#9 0x00007fdf9db98a69 in WTF::RunLoop::RunLoop()::{lambda(void*)#1}::_FUN(void*) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#10 0x00007fdf9db8a92f in WTF::RunLoop::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) ()
at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#11 0x00007fdf9a5f8a80 in g_main_context_dispatch ()
at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007fdf9a657c38 in g_main_context_iterate.constprop ()
at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007fdf9a5f871b in g_main_loop_run ()
at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#14 0x00007fdf9db96b20 in WTF::RunLoop::run() ()
at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#15 0x00007fdf9db5ddd8 in WTF::Thread::entryPoint(WTF::Thread::NewThreadContext*) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
Let me take a look and see if I can reproduce this… I suppose that “page is not rendered completely” means for example that the WEBKIT_LOAD_FINISHED
event has not been produced yet and there are resources being loaded, or is it something else?
I don't really know the internal logic of webkit or wpe.
Maybe I'm wrong: I just seen two crashes with wpebackend-fdo-1.8.0 (but it had been OK for hours!). Maybe there is something wrong with libwpe-1.10.0 or webkitgtk-2.32.0...
I'll try to build webkitgtk without WPE renderer to see if it will still crash.
WebKitGTK-2.32.0 with -DWPE_RENDERER=OFF
seems not crashing. But maybe it's just the "native" renderer is faster (on my 4K monitor) so I didn't switch the website at the "critical" time to trigger the crash...
WebKitGTK-2.32.0 with
-DWPE_RENDERER=OFF
seems not crashing. But maybe it's just the "native" renderer is faster (on my 4K monitor) so I didn't switch the website at the "critical" time to trigger the crash...
I think this pretty much points to an issue inside WPEBackend-fdo—I am currently investigating this issue. Thanks a lot for your bug report!
So far I have not managed to reproduce the crash with GNOME Web.
@xry111 Do you remember which web pages you were visiting to trigger the crash? I think that may help me.
So far I have not managed to reproduce the crash with GNOME Web.
@xry111 Do you remember which web pages you were visiting to trigger the crash? I think that may help me.
I triggered the crash by accessing https://www.baidu.com, then https://webkitgtk.org, then back to https://www.baidu.com, then back to https://webkitgtk.org, ...
It seems having a page from https://www.baidu.com opened in another tab can make it crash quicker.
I can't figure out how to reproduce this, but I've hit it four five 21 times so far today with Tech Preview, and zero times prior to today. I think it's a regression in wpebackend-fdo 1.9.90, but I can't be sure because I would have expected to notice it earlier if so. (P.S. my "today" has been 20 minutes long thus far. This one is nasty.)
The web page doesn't seem to matter. It happens for page loads in new web views and page loads in existing web views, with and without accelerated compositing mode. I see no pattern.
I can give a better backtrace though:
Core was generated by `/usr/libexec/webkit2gtk-4.0/WebKitWebProcess 232 126'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 wl_proxy_marshal_constructor (proxy=0x0, opcode=opcode@entry=0, interface=0x7f5004cedec0 <wl_surface_interface>)
at ../src/wayland-client.c:829
829 va_start(ap, interface);
[Current thread is 1 (Thread 0x7f4e2dffb700 (LWP 135))]
(gdb) bt full
#0 wl_proxy_marshal_constructor (proxy=0x0, opcode=opcode@entry=0, interface=0x7f5004cedec0 <wl_surface_interface>)
at ../src/wayland-client.c:829
args =
{{i = -1006626928, u = 3288340368, f = -1006626928, s = 0x7f4dc4001790 "P\030", o = 0x7f4dc4001790, n = 3288340368, a = 0x7f4dc4001790, h = -1006626928}, {i = -7066208, u = 4287901088, f = -7066208, s = 0x55eaff942da0 "\220\230\225\377\352U", o = 0x55eaff942da0, n = 4287901088, a = 0x55eaff942da0, h = -7066208}, {i = -7066208, u = 4287901088, f = -7066208, s = 0x55eaff942da0 "\220\230\225\377\352U", o = 0x55eaff942da0, n = 4287901088, a = 0x55eaff942da0, h = -7066208}, {i = -6968280, u = 4287999016, f = -6968280, s = 0x55eaff95ac28 "h\341\217\377\352U", o = 0x55eaff95ac28, n = 4287999016, a = 0x55eaff95ac28, h = -6968280}, {i = 48, u = 48, f = 48, s = 0x3000000030 <error: Cannot access memory at address 0x3000000030>, o = 0x3000000030, n = 48, a = 0x3000000030, h = 48}, {i = -4594272, u = 4290373024, f = -4594272, s = 0x55eaffb9e5a0 "\200\351\316\004P\177", o = 0x55eaffb9e5a0, n = 4290373024, a = 0x55eaffb9e5a0, h = -4594272}, {i = 771729680, u = 771729680, f = 771729680, s = 0x7f4e2dffa910 "\002", o = 0x7f4e2dffa910, n = 771729680, a = 0x7f4e2dffa910, h = 771729680}, {i = -4594272, u = 4290373024, f = -4594272, s = 0x55eaffb9e5a0 "\200\351\316\004P\177", o = 0x55eaffb9e5a0, n = 4290373024, a = 0x55eaffb9e5a0, h = -4594272}, {i = -4594040, u = 4290373256, f = -4594040, s = 0x55eaffb9e688 "", o = 0x55eaffb9e688, n = 4290373256, a = 0x55eaffb9e688, h = -4594040}, {i = 879, u = 879, f = 879, s = 0x36f <error: Cannot access memory at address 0x36f>, o = 0x36f, n = 879, a = 0x36f, h = 879}, {i = -1006630000, u = 3288337296, f = -1006630000, s = 0x7f4dc4000b90 "", o = 0x7f4dc4000b90, n = 3288337296, a = 0x7f4dc4000b90, h = -1006630000}, {i = 80519796, u = 80519796, f = 80519796, s = 0x7f5004cca274 <wl_display_read_events+1124> "\213SX\307D$\f\377\377\377\377\211\020\351\035\374\377\377L\213l$(\213CX\205\300u\275\350\031\342\377\377\213", o = 0x7f5004cca274 <wl_display_read_events+1124>, n = 80519796, a = 0x7f5004cca274 <wl_display_read_events+1124>, h = 80519796}, {i = 6, u = 6, f = 6, s = 0x55ea00000006 <error: Cannot access memory at address 0x55ea00000006>, o = 0x55ea00000006, n = 6, a = 0x55ea00000006, h = 6}, {i = 0, u = 0, f = 0, s = 0xffffffff00000000 <error: Cannot access memory at address 0xffffffff00000000>, o = 0xffffffff00000000, n = 0, a = 0xffffffff00000000, h = 0}, {i = 0, u = 0, f = 0, s = 0x0, o = 0x0, n = 0, a = 0x0, h = 0}, {i = 0, u = 0, f = 0, s = 0x0, o = 0x0, n = 0, a = 0x0, h = 0}, {i = 0, u = 0, f = 0, s = 0x0, o = 0x0, n = 0, a = 0x0, h = 0}, {i = -1522499840, u = 2772467456, f = -1522499840, s = 0x1b1c0865a5407f00 <error: Cannot access memory at address 0x1b1c0865a5407f00>, o = 0x1b1c0865a5407f00, n = 2772467456, a = 0x1b1c0865a5407f00, h = -1522499840}, {i = 0, u = 0, f = 0, s = 0x0, o = 0x0, n = 0, a = 0x0, h = 0}, {i = 0, u = 0, f = 0, s = 0x0, o = 0x0, n = 0, a = 0x0, h = 0}}
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7f4e2dffa9a0, reg_save_area = 0x7f4e2dffa8c0}}
#1 0x00007f50083d83cd in wl_compositor_create_surface (wl_compositor=<optimized out>)
at /usr/include/wayland-client-protocol.h:1281
id = <optimized out>
display = 0x55eaffb9e5a0
registry = 0x7f4dc4001790
#2 WS::BaseTarget::initialize(WS::BaseBackend&) (this=this@entry=0x7f4dc40016e8, backend=...)
at ../src/ws-client.cpp:207
display = 0x55eaffb9e5a0
registry = 0x7f4dc4001790
#3 0x00007f50083d417a in (anonymous namespace)::Target::initialize
(height=879, width=1306, backend=..., this=<optimized out>) at ../src/renderer-backend-egl.cpp:147
backend =
@0x55eaff9b9730: {<WS::BaseBackend> = {static s_registryListener = {global = 0x7f50083d7db0 <_FUN(void*, wl_registry*, uint32_t, char const*, uint32_t)>, global_remove = 0x7f50083d7c20 <_FUN(void*, wl_registry*, uint32_t)>}, static s_bridgeListener = {implementation_info = 0x7f50083d7c30 <_FUN(void*, wpe_bridge*, uint32_t)>, connected = 0x7f50083d7da0 <WS::BaseBackend::{lambda(void*, wpe_bridge*, unsigned int)#8}::_FUN(void*, wpe_bridge*, unsigned int)>}, m_wl = {display = 0x55eaffb9e5a0, wpeBridge = 0x55eaffaa1d10}, m_type = WS::ClientImplementationType::Wayland}, m_impl = std::unique_ptr<class WS::EGLClient::BackendImpl> = {get() = 0x55eaff97c300}}
#4 fdo_renderer_backend_egl_target::{lambda(void*, fdo_renderer_backend_egl_target, unsigned int, unsigned int)#3}::operator()(fdo_renderer_backend_egl_target, fdo_renderer_backend_egl_target, unsigned int, unsigned int) const
(height=879, width=1306, backend_data=0x55eaff9b9730, data=0x7f4dc40016e0, __closure=0x0)
at ../src/renderer-backend-egl.cpp:147
backend =
@0x55eaff9b9730: {<WS::BaseBackend> = {static s_registryListener = {global = 0x7f50083d7db0 <_FUN(void*, wl_registry*, uint32_t, char const*, uint32_t)>, global_remove = 0x7f50083d7c20 <_FUN(void*, wl_registry*, uint32_t)>}, static s_bridgeListener = {implementation_info = 0x7f50083d7c30 <_FUN(void*, wpe_bridge*, uint32_t)>, connected = 0x7f50083d7da0 <WS::BaseBackend::{lambda(void*, wpe_bridge*, unsigned int)#8}::_FUN(void*, wpe_bridge*, unsigned int)>}, m_wl = {display = 0x55eaffb9e5a0, wpeBridge = 0x55eaffaa1d10}, m_type = WS::ClientImplementationType::Wayland}, m_impl = std::unique_ptr<class WS::EGLClient::BackendImpl> = {get() = 0x55eaff97c300}}
#5 fdo_renderer_backend_egl_target::{lambda(void*, fdo_renderer_backend_egl_target, unsigned int, unsigned int)#3}::_FUN(fdo_renderer_backend_egl_target, fdo_renderer_backend_egl_target, unsigned int, unsigned int) () at ../src/renderer-backend-egl.cpp:148
#6 0x00007f50093086b2 in WebKit::LayerTreeHost::nativeSurfaceHandleForCompositing() (this=0x7f4ea0026360) at /usr/include/c++/10.2.0/bits/unique_ptr.h:421
#7 0x00007f5008f7f665 in operator() (__closure=0x7f4f505962c0) at ../Source/WebKit/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:71
protectedThis = {static isRef = <optimized out>, m_ptr = 0x7f4e43a54a00}
this = 0x7f4e43a54a00
#8 WTF::Detail::CallableWrapper<WebKit::ThreadedCompositor::ThreadedCompositor(WebKit::ThreadedCompositor::Client&, WebKit::ThreadedDisplayRefreshMonitor::Client&, WebCore::PlatformDisplayID, const WebCore::IntSize&, float, WebCore::TextureMapper::PaintFlags)::<lambda()>, void>::call(void) (this=0x7f4f505962b8) at DerivedSources/ForwardingHeaders/wtf/Function.h:52
#9 0x00007f5008f7dae7 in WTF::Function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/10.2.0/bits/unique_ptr.h:421
locker = <optimized out>
#10 operator() (__closure=<optimized out>) at ../Source/WebKit/Shared/CoordinatedGraphics/threadedcompositor/CompositingRunLoop.cpp:90
locker = <optimized out>
#11 WTF::Detail::CallableWrapper<WebKit::CompositingRunLoop::performTaskSync(WTF::Function<void()>&&)::<lambda()>, void>::call(void) (this=0x7f4f505962d0) at DerivedSources/ForwardingHeaders/wtf/Function.h:52
#12 0x00007f5007b5a6f3 in WTF::Function<void ()>::operator()() const (this=<synthetic pointer>) at ../Source/WTF/wtf/Function.h:80
didSuspendFunctions = false
#13 WTF::RunLoop::performWork() (this=0x7f4f50570000) at ../Source/WTF/wtf/RunLoop.cpp:128
didSuspendFunctions = false
#14 0x00007f5007baaf5d in operator() (userData=<optimized out>, __closure=0x0) at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:80
#15 _FUN(gpointer) () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:82
#16 0x00007f5007bab883 in operator() (__closure=0x0, userData=0x7f4f50570000, callback=0x7f5007baaf50 <_FUN(gpointer)>, source=0x7f4dc40014a0) at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:53
name = 0x7f4dc4001510 "[WebKit] RunLoop work"
runLoopSource = @0x7f4dc40014a0: {source = {callback_data = 0x7f4dc4001530, callback_funcs = 0x7f500809b2c0 <g_source_callback_funcs>, source_funcs = 0x7f5007f56180 <WTF::RunLoop::s_runLoopSourceFunctions>, ref_count = 3, context = 0x7f4dc4000b90, priority = 100, flags = 35, source_id = 1, poll_fds = 0x0, prev = 0x0, next = 0x0, name = 0x7f4dc4001510 "[WebKit] RunLoop work", priv = 0x55eaffacb0d0}, runLoop = 0x7f4f50570000}
returnValue = <optimized out>
#17 _FUN(GSource*, GSourceFunc, gpointer) () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:56
#18 0x00007f5007fbc1bf in g_main_dispatch (context=0x7f4dc4000b90) at ../glib/gmain.c:3337
dispatch = 0x7f5007bab830 <_FUN(GSource*, GSourceFunc, gpointer)>
prev_source = 0x0
begin_time_nsec = 0
was_in_call = 0
user_data = 0x7f4f50570000
callback = 0x7f5007baaf50 <_FUN(gpointer)>
cb_funcs = <optimized out>
cb_data = 0x7f4dc4001530
need_destroy = <optimized out>
source = 0x7f4dc40014a0
current = 0x7f4dc4001690
i = 0
__func__ = "g_main_dispatch"
#19 g_main_context_dispatch (context=0x7f4dc4000b90) at ../glib/gmain.c:4055
#20 0x00007f5007fbc568 in g_main_context_iterate (context=0x7f4dc4000b90, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4131
max_priority = 2147483647
timeout = -1
some_ready = 1
nfds = <optimized out>
allocated_nfds = <optimized out>
fds = 0x7f4dc40015e0
#21 0x00007f5007fbc883 in g_main_loop_run (loop=loop@entry=0x7f4dc4001480) at ../glib/gmain.c:4329
__func__ = "g_main_loop_run"
#22 0x00007f5007bab9e0 in WTF::RunLoop::run() () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:108
runLoop = @0x7f4f50570000: {<WTF::FunctionDispatcher> = {<WTF::ThreadSafeRefCounted<WTF::FunctionDispatcher, (WTF::DestructionThread)0>> = {<WTF::ThreadSafeRefCountedBase> = {m_refCount = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2}, static is_always_lock_free = true}}, <No data fields>}, _vptr.FunctionDispatcher = 0x7f5007f20a90 <vtable for WTF::RunLoop+16>}, m_currentIteration = {m_start = 1, m_end = 1, m_buffer = {<WTF::VectorBufferBase<WTF::Function<void()>, WTF::FastMalloc>> = {m_buffer = 0x7f4e43a54a80, m_capacity = 16, m_size = 0}, <No data fields>}}, m_nextIterationLock = {static isHeldBit = 1 '\001', static hasParkedBit = 2 '\002', m_byte = {value = {<std::__atomic_base<unsigned char>> = {static _S_alignment = 1, _M_i = 0 '\000'}, static is_always_lock_free = true}}}, m_nextIteration = {m_start = 0, m_end = 0, m_buffer = {<WTF::VectorBufferBase<WTF::Function<void()>, WTF::FastMalloc>> = {m_buffer = 0x0, m_capacity = 0, m_size = 0}, <No data fields>}}, m_isFunctionDispatchSuspended = false, m_hasSuspendedFunctions = false, static s_runLoopSourceFunctions = {prepare = 0x0, check = 0x0, dispatch = 0x7f5007bab830 <_FUN(GSource*, GSourceFunc, gpointer)>, finalize = 0x0, closure_callback = 0x0, closure_marshal = 0x0}, m_mainContext = {m_ptr = 0x7f4dc4000b90}, m_mainLoops = {<WTF::VectorBuffer<WTF::GRefPtr<_GMainLoop>, 0, WTF::FastMalloc>> = {<WTF::VectorBufferBase<WTF::GRefPtr<_GMainLoop>, WTF::FastMalloc>> = {m_buffer = 0x7f4e40f76000, m_capacity = 16, m_size = 1}, <No data fields>}, <No data fields>}, m_source = {m_ptr = 0x7f4dc40014a0}, m_observers = {m_set = {m_impl = {static smallMaxLoadNumerator = <optimized out>, static smallMaxLoadDenominator = <optimized out>, static largeMaxLoadNumerator = <optimized out>, static largeMaxLoadDenominator = <optimized out>, static maxSmallTableCapacity = <optimized out>, static minLoad = <optimized out>, static tableSizeOffset = <optimized out>, static tableSizeMaskOffset = <optimized out>, static keyCountOffset = <optimized out>, static deletedCountOffset = <optimized out>, static metadataSize = <optimized out>, m_table = 0x0}}}}
mainContext = 0x7f4dc4000b90
innermostLoop = 0x7f4dc4001480
nestedMainLoop = <optimized out>
#23 0x00007f5007b5bf3d in WTF::Function<void ()>::operator()() const (this=<synthetic pointer>) at ../Source/WTF/wtf/Function.h:80
function = {m_callableWrapper = std::unique_ptr<class WTF::Detail::CallableWrapperBase<void>> = {get() = 0x7f4f505962a0}}
#24 WTF::Thread::entryPoint(WTF::Thread::NewThreadContext*) (newThreadContext=0x7f4f505caca8) at ../Source/WTF/wtf/Threading.cpp:181
function = {m_callableWrapper = std::unique_ptr<class WTF::Detail::CallableWrapperBase<void>> = {get() = 0x7f4f505962a0}}
#25 0x00007f5007badc4d in WTF::wtfThreadEntryPoint(void*) (context=<optimized out>) at ../Source/WTF/wtf/posix/ThreadingPOSIX.cpp:241
#26 0x00007f50047424d2 in start_thread (arg=<optimized out>) at pthread_create.c:477
ret = <optimized out>
pd = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139973755909888, -8255744929143707425, 140723109722702, 140723109722703, 139973755907392, 94467805741728, 8354714167187876063, 8344531645611256031}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = 0
#27 0x00007f50084ed323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
OK here's something interesting:
(gdb) frame 2
#2 WS::BaseTarget::initialize (this=this@entry=0x7f4dc40016e8, backend=...) at ../src/ws-client.cpp:207
207 m_wl.surface = wl_compositor_create_surface(m_wl.compositor);
(gdb) print m_wl
$1 = {eventQueue = 0x7f4dc4001770, compositor = 0x0, wpeBridge = 0x0, wpeBridgeId = 0, surface = 0x0,
frameCallback = 0x0
(gdb) print m_glib
$3 = {socket = std::unique_ptr<FdoIPC::Connection> = {get() = 0x7f4dc4001750}, wlSource = 0x0
I was expecting this to be some complicated use-after-free or something like that, but turns out it's simple nullptr dereference: ml_wl.compositor
is nullptr when we pass it to wl_compositor_create_surface(). But as to why:
- m_wl.compositor has just not been bound yet (ws-client.cpp:247) before BaseTarget::initialize is called. But I don't see how that could happen?
Maybe the BaseTarget has already been destroyed? BaseTarget's destructor would set m_wl.compositor to nullptr but wouldn't clear m_glib.socket, so it might be possible?
I managed to reproduce in jhbuild by loading https://duckduckgo.com in a bunch of different tabs and getting a little lucky. Instrumenting the build with printfs, I see the normal success case looks like this:
BaseTarget: 0xae19b8
initialize: 0xae19b8
operator(): target=0xae19b8: m_wl=0xae19d0 wl_compositor=0xae19d8
operator(): target=0xae19b8: m_wl=0xae19d0 wpeBridge=0xae19d8
The failing case looks like this:
BaseTarget: 0xd9cac8
initialize: 0xd9cac8
initialize: target=0xd9cac8: wl_compositor=(nil) (ABOUT TO CRASH!)
This disproves the second theory in my previous comment: BaseTarget was just constructed and is still valid, problem is simply that m_wl.wl_compositor was never initialized during the call to wl_display_roundtrip_queue().
A bit more debugging. This time I added one more print in Instance::Instance (ws.cpp), in the UI process. The good case looks like this:
BaseTarget: 0x2705408
initialize: 0x2705408
operator(): target=0x2705408: m_wl=0x2705420 wl_compositor=0x2705428
operator(): target=0x2705408: m_wl=0x2705420 wpeBridge=0x2705428
operator(): id=9 wl_global_create cb: set s_compositorInterface implementation!
Same as before, just one more line this time. The bad case, however, is unchanged. In the bad case, somehow the UI process didn't get the memo that the web process has connected and it must send global events (as described here). The compositor object of course exists properly in the UI process, otherwise all web processes would be crashing. I think there is some sort of Wayland protocol communication issue that I do not yet understand.
Adrian suggested WAYLAND_DEBUG=1. Here is the good case where we do not crash:
BaseTarget: 0x1acba38
initialize: 0x1acba38
[1603106.787] -> wl_display@1.get_registry(new id wl_registry@5)
[1603106.816] -> wl_display@1.sync(new id wl_callback@8)
[1603106.884] wl_display@1.get_registry(new id wl_registry@5)
[1603106.911] -> wl_registry@5.global(1, "wl_compositor", 3)
[1603106.924] -> wl_registry@5.global(2, "wpe_bridge", 1)
[1603106.936] -> wl_registry@5.global(3, "wl_shm", 1)
[1603106.948] -> wl_registry@5.global(4, "wl_drm", 2)
[1603106.962] -> wl_registry@5.global(5, "zwp_linux_dmabuf_v1", 3)
[1603106.977] wl_display@1.sync(new id wl_callback@8)
[1603106.984] -> wl_callback@8.done(0)
[1603106.990] -> wl_display@1.delete_id(8)
[1603107.048] wl_display@1.delete_id(8)
[1603107.066] wl_registry@5.global(1, "wl_compositor", 3)
[1603107.091] -> wl_registry@5.bind(1, "wl_compositor", 1, new id [unknown]@9)
operator(): target=0x1acba38: m_wl=0x1acba50 wl_compositor=0x1acba58
[1603107.131] wl_registry@5.global(2, "wpe_bridge", 1)
[1603107.151] -> wl_registry@5.bind(2, "wpe_bridge", 1, new id [unknown]@10)
operator(): target=0x1acba38: m_wl=0x1acba50 wpeBridge=0x1acba58
[1603107.182] wl_registry@5.global(3, "wl_shm", 1)
[1603107.200] wl_registry@5.global(4, "wl_drm", 2)
[1603107.221] wl_registry@5.global(5, "zwp_linux_dmabuf_v1", 3)
[1603107.240] wl_callback@8.done(0)
[1603107.252] -> wl_compositor@9.create_surface(new id wl_surface@8)
[1603107.308] -> wpe_bridge@10.connect(wl_surface@8)
[1603107.319] -> wl_display@1.sync(new id wl_callback@11)
[1603107.364] wl_registry@5.bind(1, "wl_compositor", 1, new id [unknown]@9)
operator(): id=9 wl_global_create cb: set s_compositorInterface implementation!
Here is the bad case where we crash:
BaseTarget: 0x288eb28
initialize: 0x288eb28
[1615593.282] -> wl_display@1.get_registry(new id wl_registry@12)
[1615593.298] -> wl_display@1.sync(new id wl_callback@11)
initialize: target=0x288eb28: wl_compositor=(nil) (ABOUT TO CRASH!)
You can see a pretty big difference there....
Bisecting leads to:
9a0e6cb62bfae44ef3496d950899ecb34ddd24a5 is the first bad commit
commit 9a0e6cb62bfae44ef3496d950899ecb34ddd24a5
Author: Pablo Saavedra <psaavedra@igalia.com>
Date: Mon Nov 30 18:57:12 2020 +0000
ws: Call wl_client_destroy for the created wl_client in Instance::unregisterViewBackend()
src/ws.cpp | 1 +
1 file changed, 1 insertion(+)
Although I'm very confident this is the first bad commit, there is a caveat: I counted both this web process crash OR the recent UI process freezes fixed by 99bd040 as a bad commit, because there's no practical way to determine whether any commit affected by the UI process freeze is also affected by this web process crash. So it's not a very useful bisect other than to tell us what the last good state of wpebackend-fdo was. But all commits since then are obviously bad in that I'll either get a web process crash or UI process hang within the first 30 seconds or so of attempting to reproduce. My current reproducer is to open Epiphany with one browser tab and just cycle between loading https://duckduckgo.com and https://youtube.com.
So far I have not managed to reproduce the crash with GNOME Web.
In Epiphany, try:
- load https://duckduckgo.com
- load https://youtube.com in the same tab
- load https://duckduckgo.com again
This seems to always crash.
I wound up debugging wl_connection. Good case:
BaseTarget: this=0x22f5788 (pid=229728) (tid=229844)
wl_connection_write: connection=0x21bb570 (pid=229728 tid=229844)
initialize: this=0x22f5788 backend=0x1fd9970 display=0x21bb410 registry=0x2323de0 (pid=229728) (tid=229844)
wl_connection_write: connection=0x21bb570 (pid=229728 tid=229844)
wl_connection_flush: connection=0x21bb570 (pid=229728 tid=229844)
wl_connection_write: connection=0x187e650 (pid=229554 tid=229681)
wl_connection_write: connection=0x187e650 (pid=229554 tid=229681)
~BaseTarget: this=0x1ab1c98 (pid=229554) (tid=229681)
You can see the web process (ws-client.cpp) writing to the UI process (ws.cpp), and the UI process writing back. I'm a little surprised that the UI process does not use wl_connection_read, but I guess it does so some other way. It's definitely reading events from the web process because it notices that the web process created a wl_registry and it sends global events back, causing m_wl.wl_compositor to be initialized in the web process.
Bad case:
BaseTarget: this=0x16e7d28 (pid=229554) (tid=229964)
wl_connection_write: connection=0x187e650 (pid=229554 tid=229964)
initialize: this=0x16e7d28 backend=0x18059d0 display=0x187e4f0 registry=0x1ff94e0 (pid=229554) (tid=229964)
wl_connection_write: connection=0x187e650 (pid=229554 tid=229964)
wl_connection_flush: connection=0x187e650 (pid=229554 tid=229964)
wl_connection_flush: connection=0x187e650 STRANGE_BAD_ERROR=Broken pipe (pid=229554 tid=229964)
wl_connection_read: connection=0x187e650 (pid=229554 tid=229964)
wl_connection_read: connection=0x187e650 len<=0, bailing out (pid=229554 tid=229964)
initialize: this=0x16e7d28 ABOUT TO CRASH! (pid=229554)
The UI Process (ws.cpp) does not send any global events because we fail to write to it... because the connection to the server has been closed somehow (broken pipe). Some recent regression in wpebackend-fdo -- likely in the UI process code -- must be responsible for closing the pipe between the UI process and the web process.
So I asked myself: where does the pipe between web process and UI process get closed? It is closed by the UI process when it calls wl_client_destroy() in view-backend-private.cpp. Indeed, this fixes the bug:
diff --git a/src/view-backend-private.cpp b/src/view-backend-private.cpp
index 78f0e2f..b2abf93 100644
--- a/src/view-backend-private.cpp
+++ b/src/view-backend-private.cpp
@@ -161,11 +161,11 @@ void ViewBackend::unregisterSurface(uint32_t surfaceId)
// Destroying the client triggers the m_clientDestroy callback,
// the rest of the teardown is done from there.
- wl_client_destroy(m_client);
+// wl_client_destroy(m_client);
// After destroying the client, none of these can be valid.
- g_assert(m_client == nullptr);
- g_assert(m_surfaceId == 0);
+// g_assert(m_client == nullptr);
+// g_assert(m_surfaceId == 0);
}
void ViewBackend::didReceiveMessage(uint32_t messageId, uint32_t messageBody)
Of course that's surely not the right fix, and it causes the UI process to crash on shutdown, but it fixes the crashes during regular browsing, so that's something.
Anyway, I think we can be confident that something is wrong with 99bd040. Adrian, any thoughts on this?
OK, problem is the UI process (ws.cpp, view-backend-private.cpp) is calling wl_client_destroy() too soon, closing its connection to the web process. When the web process is reused after a process swap, its wl_display() is unexpectedly no longer connected, and so wl_connection silently discards our events.
To fix this, I have to revert 99bd040, f54135e, and d086b69. With all three commits reverted, wpebackend-fdo seems to be back to a good working state. At least, this particular bug goes away.
Some observations: each wl_client in the UI process (ws.cpp, view-backend-private.cpp) corresponds to an entire web process (ws-client.cpp). The Wayland IPC connection is owned by each wl_client. Surface unregistration corresponds to the destruction of a BaseTarget in the web process, but the wayland IPC connection is owned by the BaseBackend in the web process, not the BaseTarget. Each BaseBackend can have an unlimited number of BaseTargets. When a BaseTarget is destroyed, it unregisters its surface, but the BaseBackend stays around: it will create another BaseTargets later if we process swap back to its web process.
Conclusion: the UI process must call wl_client_destroy() only in response to the destruction of BaseBackend, not in response to the destruction of BaseTarget (or surface deregistration).
CC @psaavedra
I have been using a build with #153 applied (which builds on top of #146 and #151) and it looks like we are now on the way to having this issue fixed 🛠️
Many thanks to @mcatanzaro for all the back-and-forth discussions while working on a fix and for his attempt at a patch in #148; to @psaavedra for helping out with the testing; and to @zdobersek for the code reviews 🤗
Are the reviewers (not) happy with the patches?
Are the reviewers (not) happy with the patches?
I'm not an reviewer (I don't understand the internal logic of a browser at all :) but as an user I'm quite happy with the fix.
@aperezdc any head up?
Maybe I'm wrong: I just seen two crashes with wpebackend-fdo-1.8.0 (but it had been OK for hours!). Maybe there is something wrong with libwpe-1.10.0 or webkitgtk-2.32.0...
Update on this: wpebackend-fdo-1.8.0 didn't crash. The crash I saw was caused by a ldconfig
run, which reset my libWPEBackend-fdo.so.1
symlink to point to a new (and buggy) version of wpebackend-fdo shared lib.
@xry111 Thanks for reporting that the fix works for you, that's good to know!
@flaphoschi There is still one small regression that we found during our testing in an embedded ARM i.MX board, which causes the contents of web views to not be updated in some cases; I am trying to get to the bottom of it before merging the fix, but it's hard to reproduce and investigate.
Personally I am thinking of proposing that we merge #153 already now so I can make a new release candidate (that would be 1.9.91) and continue investigating later. I think most users and distributions would benefit from having a release that includes a fix that is definitely an improvement for 99% of the users out there, and that completely fixes the problems for WebKitGTK.
Would it be possible to get it into an 1.8.4 as well? Unless you think 1.10.0 is just around the corner.
@heftig Yes, I am planning to backport the fixes to the 1.8 branch and make one or two more 1.8.x releases.
Thanks for your work 🙂
Personally I am thinking of proposing that we merge #153 already now so I can make a new release candidate (that would be 1.9.91) and continue investigating later. I think most users and distributions would benefit from having a release that includes a fix that is definitely an improvement for 99% of the users out there, and that completely fixes the problems for WebKitGTK.
I've been prodding you to do this to the point it has probably turned into borderline harassment. :P Even if something unknown is still wrong with #153, it resolves the emergency regression and should go out ASAP. We've left users with borderline-unusable browsers in the meantime.
@mcatanzaro: A maybe off-topic question: is WPE backend considered the "offical" implementation for WebKitGTK, and the "old" backend considered "alternative" or "deprecated" now? If it's true we may update Beyond Linux From Scratch book to add libwpe & wpebackend-fdo.
A maybe off-topic question: is WPE backend considered the "offical" implementation for WebKitGTK, and the "old" backend considered "alternative" or "deprecated" now? If it's true we may update Beyond Linux From Scratch book to add libwpe & wpebackend-fdo.
Yes absolutely, the USE_WPE_RENDERER setting should only be disabled for older distros that predate libwpe and wpebackend-fdo. You should definitely move to libwpe and wpebackend-fdo, unless you want your own separate set of bugs.
Can we have a 1.8.4 release for this, please?