Swapchain Creation fails on Wayland (mutter) when nvidia is not the primary interface
tim-rex opened this issue · 6 comments
In a similar vein to #94, I am also seeing vkCreateSwapchainKHR
fail with VK_ERROR_INITIALIZATION_FAILED
I notice also that having EGL_LOG_LEVEL=debug
will cause the following to be logged when this failure occurs, which may provide a clue.
libEGL debug: EGL user error 0x3004 (EGL_BAD_ATTRIBUTE) in eglGetPlatformDisplay
FWIW, I'm not explicitly calling eglGetPlatformDisplay or any other egl functionality in this application.
However unlike issue #94 this is not remediated by setting nvidia_drm modeset=1 (which is already enabled) per this comment
vkCreateSwapchainKHR
is being called with the following createInfo struct
sType:VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR
pNext:0x0
flags:0
surface:0xfd5b260000000001
minImageCount:3
imageFormat:VK_FORMAT_B8G8R8A8_UNORM
imageColorSpace:VK_COLOR_SPACE_SRGB_NONLINEAR_KHR
imageExtent: 800x600
imageArrayLayers:1
imageUsage:16
imageSharingMode:VK_SHARING_MODE_EXCLUSIVE
queueFamilyIndexCount:0
pQueueFamilyIndices:0x0
preTransform:VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR
compositeAlpha:VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR
presentMode:VK_PRESENT_MODE_MAILBOX_KHR
clipped:1
oldSwapchain:0x0
Importantly..
I'm running a dual GPU system with nvidia + amdgpu under Gnome Wayland.
This only seems to occur when Gnome is using amdgpu as the primary interface. Swapchain creation seems fine when nvidia is the primary interface, or when it is the only interface in use.
Fedora Linux 39 (Workstation Edition)
Linux 6.5.11-300.fc39.x86_64
GNOME Version 45.1
nVidia Driver version 535.129.03
Output of eglinfo attached
eglinfo.txt
Some interesting observations.. probably unrelated
When this occurs WAYLAND_DEBUG emits the following:
[ 922868.934] wl_callback@60.done(7540)
[ 922868.946] -> wl_display@1.sync(new id wl_callback@60)
[ 922869.098] wl_display@1.delete_id(60)
[ 922869.102] wl_drm@24.device("/dev/dri/renderD128")
[ 922869.108] wl_drm@24.format(808669761)
[ 922869.111] wl_drm@24.format(808669784)
[ 922869.116] wl_drm@24.format(808665665)
[ 922869.120] wl_drm@24.format(808665688)
[ 922869.124] wl_drm@24.format(875713089)
[ 922869.128] wl_drm@24.format(875713112)
[ 922869.132] wl_drm@24.format(909199186)
[ 922869.136] wl_drm@24.format(961959257)
[ 922869.139] wl_drm@24.format(825316697)
[ 922869.142] wl_drm@24.format(842093913)
[ 922869.145] wl_drm@24.format(909202777)
[ 922869.148] wl_drm@24.format(875713881)
[ 922869.151] wl_drm@24.format(842094158)
[ 922869.154] wl_drm@24.format(909203022)
[ 922869.157] wl_drm@24.format(1448695129)
[ 922869.160] wl_drm@24.capabilities(1)
[ 922869.163] wl_callback@60.done(7540)
libEGL debug: EGL user error 0x3004 (EGL_BAD_ATTRIBUTE) in eglGetPlatformDisplay
In particular, that reference to /dev/dri/renderD128
is confusing, as that is my AMD device.. despite that I am using an nVidia logical device in my Vulkan initialisation.
/dev/dri/by-path/pci-0000:01:00.0-card -> ../card1
/dev/dri/by-path/pci-0000:01:00.0-render -> ../renderD129
/dev/dri/by-path/pci-0000:02:00.0-card -> ../card0
/dev/dri/by-path/pci-0000:02:00.0-render -> ../renderD128
/sys/class/drm/renderD128/device/driver -> ../../../../bus/pci/drivers/amdgpu
/sys/class/drm/renderD129/device/driver -> ../../../../bus/pci/drivers/nvidia
And for the sake of experimentation..
Setting the following does allow me to proceed further..
DRI_PRIME=pci-0000_01_00_0" __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAMEnvidia
Swapchain creation succeeds, but ultimately fails on vkQueuePresentKHR
with
[ 439850.098] -> zwp_linux_dmabuf_v1@52.create_params(new id zwp_linux_buffer_params_v1@44)
[ 439850.107] -> zwp_linux_buffer_params_v1@44.add(fd 42, 0, 0, 3200, 50331648, 5234708)
[ 439850.111] -> zwp_linux_buffer_params_v1@44.create_immed(new id wl_buffer@40, 800, 600, 875713112, 0)
[ 439850.115] -> zwp_linux_buffer_params_v1@44.destroy()
[ 439850.121] -> wl_surface@16.attach(wl_buffer@40, 0, 0)
[ 439850.124] -> wl_surface@16.damage(0, 0, 800, 600)
[ 439850.127] -> wl_surface@16.commit()
[ 439850.130] -> wl_display@1.sync(new id wl_callback@36)
[ 439850.621] wl_display@1.error(nil, 7, "failed to import supplied dmabufs: Arguments are inconsistent (for example, a valid context requires buffers not supplied by a ")
[destroyed object]: error 7: failed to import supplied dmabufs: Arguments are inconsistent (for example, a valid context requires buffers not supplied by a
Wayland bailed!! errno=71 : Protocol error
The only reference I can find for eglGetPlatformDisplay to return EGL_BAD_ATTRIBUTE
is noted in the EGL_EXT_explicit_device
extension notes
If EGL_EXT_platform_device is supported, passing EGL_DEVICE_EXT as an attribute to eglGetPlatformDisplay(EGL_PLATFORM_DEVICE_EXT) generates EGL_BAD_ATTRIBUTE.
The only reference I can find for eglGetPlatformDisplay to return
EGL_BAD_ATTRIBUTE
is noted in theEGL_EXT_explicit_device
extension notes
EGL_BAD_ATTRIBUTE
is a generic error for any case where the implementation doesn't recognize an attribute enum. From the EGL spec, section 3.1:
EGL_BAD_ATTRIBUTE
An unrecognized attribute or attribute value was passed in an attribute list. Any command taking an attribute parameter or attribute list may generate this error.
Unfortunately, that doesn't tell us what's calling eglGetPlatformDisplay or what the offending attribute is...
I can pull on that thread..
Here's the call stack when eglGetPlatformDisplay gets called
#0 eglGetPlatformDisplay (platform=12760, native_display=0x6564090, attrib_list=0x7fff89b93200) at /usr/src/debug/libglvnd-1.7.0-1.fc39.x86_64/src/EGL/libegl.c:409
#1 0x00007fff22e01fe0 in ProducerInit () from /lib64/libnvidia-vulkan-producer.so
#2 0x00007fff32a19872 in ?? () from /lib64/libnvidia-glcore.so.535.129.03
#3 0x00007fff32a43bbf in ?? () from /lib64/libnvidia-glcore.so.535.129.03
#4 0x00007fff32a67bdd in ?? () from /lib64/libnvidia-glcore.so.535.129.03
#5 0x00007fff88453b20 in ?? () from /lib64/libGLX_nvidia.so.0
#6 0x00007fff885d0fb7 in terminator_CreateSwapchainKHR (device=0x7fff85c5e430, pCreateInfo=0x7fff85c3f050, pAllocator=0x0, pSwapchain=0xa34520 <vulkan+6352>) at /vulkan-sdk/1.3.268.0/source/Vulkan-Loader/loader/wsi.c:499
#7 0x00007fff24f92b9d in DispatchCreateSwapchainKHR (device=device@entry=0x7fff85c5e430, pCreateInfo=pCreateInfo@entry=0x7fff89b93880, pAllocator=pAllocator@entry=0x0, pSwapchain=pSwapchain@entry=0xa34520 <vulkan+6352>)
at /vulkan-sdk/1.3.268.0/source/Vulkan-ValidationLayers/layers/vulkan/generated/vk_safe_struct.h:4590
#8 0x00007fff24e79ab3 in vulkan_layer_chassis::CreateSwapchainKHR (device=0x7fff85c5e430, pCreateInfo=0x7fff89b93880, pAllocator=0x0, pSwapchain=0xa34520 <vulkan+6352>)
at /vulkan-sdk/1.3.268.0/source/Vulkan-ValidationLayers/layers/vulkan/generated/chassis.cpp:5714
#9 0x00000000005a7bac in VulkanWrapper::createSwapChain (this=0xa32c50 <vulkan>, swapchainSupport=..., surfaceFormat=..., surfaceFormat2=..., preferredPresentMode=VK_PRESENT_MODE_MAILBOX_KHR) at ./common/vulkan_helper.cpp:1644
#10 0x00000000005940c4 in VulkanWrapper::createSwapChain (this=0xa32c50 <vulkan>, swapchainSupport=..., surfaceFormat=..., surfaceFormat2=...) at ./common/vulkan_helper.cpp:1471
#11 0x000000000056e454 in VulkanWrapper::initVulkan (this=0xa32c50 <vulkan>, hWnd=0x6569e70) at ./common/vulkan_helper.cpp:5612
#12 0x00000000004f7838 in processRenderEvents () at ./core/render.cpp:155
#13 0x000000000050312b in update_loop () at ./core/main.cpp:917
#14 0x0000000000503ab5 in main_loop (argc=0, argv=0x0) at ./core/main.cpp:1106
#15 0x000000000040f22d in main_loop_bootstrap () at ./platform/Linux/linux_main.cpp:803
#16 0x00007ffff77e4897 in start_thread (arg=<optimized out>) at pthread_create.c:444
#17 0x00007ffff786b6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
The attrib_list at the call site is
(gdb) p/x *attrib_list@3
$43 = {0x3352, 0x1, 0x3038}
That would seem to map to:
EGL_TRACK_REFERENCES_KHR, EGL_TRUE, EGL_NONE
EGL_KHR_display_references indicates
An EGL_BAD_ATTRIBUTE error is generated if the requested value for EGL_TRACK_REFERENCES_KHR is not supported.
Stepping through from there, EGL_BAD_ATTRIBUTE is generated from the following frame
gdb) p/x *attrib_list@3
$2 = {0x3352, 0x1, 0x3038}
(gdb) frame
#0 _eglGetWaylandDisplay (native_display=0x6564090, attrib_list=0x7fff89b93200) at ../src/egl/main/egldisplay.c:535
535 _eglError(EGL_BAD_ATTRIBUTE, "eglGetPlatformDisplay");
with the trace
(gdb) bt
#0 _eglGetWaylandDisplay (native_display=0x6564090, attrib_list=0x7fff89b93200) at ../src/egl/main/egldisplay.c:535
#1 0x00007ffff7bf4fd5 in GetPlatformDisplayCommon (platform=12760, native_display=0x6564090, attrib_list=0x7fff89b93200, funcName=0x7ffff7bfb2da "eglGetPlatformDisplay")
at /usr/src/debug/libglvnd-1.7.0-1.fc39.x86_64/src/EGL/libegl.c:324
#2 0x00007fff22e01fe0 in ProducerInit () from /lib64/libnvidia-vulkan-producer.so
#3 0x00007fff32a19872 in ?? () from /lib64/libnvidia-glcore.so.535.129.03
#4 0x00007fff32a43bbf in ?? () from /lib64/libnvidia-glcore.so.535.129.03
#5 0x00007fff32a67bdd in ?? () from /lib64/libnvidia-glcore.so.535.129.03
#6 0x00007fff88453b20 in ?? () from /lib64/libGLX_nvidia.so.0
#7 0x00007fff885d0fb7 in terminator_CreateSwapchainKHR (device=0x7fff85c5f000, pCreateInfo=0x7fff85c3fc50, pAllocator=0x0, pSwapchain=0xa34520 <vulkan+6352>) at /vulkan-sdk/1.3.268.0/source/Vulkan-Loader/loader/wsi.c:499
#8 0x00007fff24f92b9d in DispatchCreateSwapchainKHR (device=device@entry=0x7fff85c5f000, pCreateInfo=pCreateInfo@entry=0x7fff89b93880, pAllocator=pAllocator@entry=0x0, pSwapchain=pSwapchain@entry=0xa34520 <vulkan+6352>)
at /vulkan-sdk/1.3.268.0/source/Vulkan-ValidationLayers/layers/vulkan/generated/vk_safe_struct.h:4590
#9 0x00007fff24e79ab3 in vulkan_layer_chassis::CreateSwapchainKHR (device=0x7fff85c5f000, pCreateInfo=0x7fff89b93880, pAllocator=0x0, pSwapchain=0xa34520 <vulkan+6352>)
at /vulkan-sdk/1.3.268.0/source/Vulkan-ValidationLayers/layers/vulkan/generated/chassis.cpp:5714
#10 0x00000000005a7bac in VulkanWrapper::createSwapChain (this=0xa32c50 <vulkan>, swapchainSupport=..., surfaceFormat=..., surfaceFormat2=..., preferredPresentMode=VK_PRESENT_MODE_MAILBOX_KHR) at ./common/vulkan_helper.cpp:1644
#11 0x00000000005940c4 in VulkanWrapper::createSwapChain (this=0xa32c50 <vulkan>, swapchainSupport=..., surfaceFormat=..., surfaceFormat2=...) at ./common/vulkan_helper.cpp:1471
#12 0x000000000056e454 in VulkanWrapper::initVulkan (this=0xa32c50 <vulkan>, hWnd=0x6569e70) at ./common/vulkan_helper.cpp:5612
#13 0x00000000004f7838 in processRenderEvents () at ./core/render.cpp:155
#14 0x000000000050312b in update_loop () at ./core/main.cpp:917
#15 0x0000000000503ab5 in main_loop (argc=0, argv=0x0) at ./core/main.cpp:1106
#16 0x000000000040f22d in main_loop_bootstrap () at ./platform/Linux/linux_main.cpp:803
#17 0x00007ffff77e4897 in start_thread (arg=<optimized out>) at pthread_create.c:444
#18 0x00007ffff786b6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Our Vulkan Wayland WSI has been pretty much entirely re-written for the 545 release, so it might be worth checking if updating fixes the issue.
I'm unable to validate on my Fedora 39 setup at this time, but i've tried to repro on a freshly installed arch linux setup on the same machine using the latest 545.29.06 drivers and running into different (earlier) issues.
Raised #96
I'll update here when I can confirm with newer drivers on F39