servo/surfman

`threads` example displays a black window on X11

hecrj opened this issue ยท 14 comments

hecrj commented

The threads example in master displays a black window on X11. I'm using Arch Linux with a GTX 2080 Ti and the NVIDIA proprietary drivers.

The offscreen example works correctly.

Let me know if you need more details.

I get an IncompatibleWinitWindow crash on:

  • KDE, X11, Debian Testing, Intel Iris 6100 (Broadwell GT3)
  • Ubuntu 20.04, Nvidia GTX 1060, proprietary + Mesa/Nouveau drivers

Servo seems to be affected by this as well: servo/servo#26353, servo/servo#26400

@pcwalton doesn't watch this repo.

KDE, X11, Debian Testing, Intel Iris 6100 (Broadwell GT3)
I ran git bisect with cd surfman; cargo +nightly run --example threads; cd ..:
Before 2925168 I got:

libEGL warning: FIXME: egl/x11 doesn't support front buffer rendering.
Failed to compile shader:
0:1(10): error: GLSL 3.30 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, 3.00 ES, 3.10 ES, and 3.20 ES

thread 'main' panicked at 'Shader compilation failed!', surfman/examples/common/mod.rs:75:17
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Since then it is:

thread 'main' panicked at 'called Result::unwrap() on an Err value: IncompatibleWinitWindow', surfman/examples/threads.rs:98:22


With cargo +nightly run --example threads --features sm-x11 I get the old error back:

libEGL warning: FIXME: egl/x11 doesn't support front buffer rendering.
Failed to compile shader:
0:1(10): error: GLSL 3.30 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, 3.00 ES, 3.10 ES, and 3.20 ES

thread 'main' panicked at 'Shader compilation failed!', surfman/examples/common/mod.rs:75:17
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

GLApi::GL => source.extend_from_slice(b"#version 330\n"),

version: GLVersion::new(3, 0),

It works if I run $ MESA_GL_VERSION_OVERRIDE=3.2 cargo +nightly run --example threads --features sm-x11. If I want to run it differently, I have to clean ~/.cache/mesa_shader_cache, otherwise it doesn't seem to recompile shaders. So Nvidia might just be silently failing (?) while Mesa told us what the problem is.

hecrj commented

So Nvidia might just be silently failing (?) while Mesa told us what the problem is.

Is there any way we can test this? I had the black window issue even when using surfman without any shaders. I am not completely familiar with the API yet, so there is a chance I may be missing something.

Just so we are on the same page, I have created an SSCCE in my fork that reproduces the issue I am describing here (enable the sm-x11 feature flag). I believe this should display a blue window, but I get a black one instead.

FYI I tried doing glReadPixels on each of the FBOs prior to presentation and to submitting the completed surface to the main thread. The worker thread's glReadPixels yields the ball image. The main thread's surface always returns transparent black for the whole buffer, even if I do another glClear (OR if I remove the alpha flag from the context creation, then it returns solid black for everything).

So this seems to only happen to the surface corresponding to the native widget. @hecrj I'll try playing with your minimal repro next (apitrace is unhappy with multithreaded rendering so I have a much uglier hack that I've been playing with to try to understand what's going on).

hecrj commented

@iamralpht I also played with glReadPixels during my experiments and got similar results.

The first read, after a clear and before presenting for the first time, got me pixel data from the region of the screen behind the window (!). After presenting, the data turned to transparent black / solid black as you described.

@hecrj neat, glad we're seeing the same things. If I run apitrace on your test case, then the replay actually shows a blue window, so maybe this is something funny with context creation.

So eglCreatePlatformWindowSurface fails on NVidia, but not on Intel. Now to figure out where it's called from and what to do about it ;).

jdm commented

That's

let egl_surface = egl.CreatePlatformWindowSurface(egl_display,
.

Ok, I was wrong about eglCreatePlatformWindowSurface. It sets the error flag, but it appears to be benign, because I have a minimal C program which successfully gets a glClear to appear in a window and that triggers the same errors.

I tried matching the X11 visual of the window with the desired X11 visual of the EGLConfig since technically you should (and I guess it was only really important when there were still PseudoColor visuals). The window depth also matches. This didn't fix the problem.

Next I ensured that the EGLConfig has EGL_SURFACE_TYPE containing EGL_WINDOW_BIT. The selected config ID is the same in the working C test, and the failing surfman test. This also did not seem to change anything.

At this point, there's not too much between the working and non-working, at least from what I can see in apitrace. I made the C program unbind the window surface and re-bind it prior to calling swap buffers, and its output is still visible. glReadPixels still returns all black pixels for the surfman context. I'll keep poking, but not sure what I'm missing.

Edit: the other odd thing is that everything works OK in apitrace's replay (which was what made me suspicious of the X visual mismatch). I'm not sure how eglretrace works for the window system bits, so maybe this doesn't mean much.

Edit 2: I have a Rust program, based on @hecrj's minimal sample which uses winit to create the window and establish the X connection, and then calls EGL manually using surfman's bindings (and uses surfman's GL binding method too). This works fine. I don't need to match visuals, even. So this tells me the problem isn't due to some quirk in window creation, isn't due to some linkage or _init magic in NVidia's EGL implementation. My next step is to make the apitrace from my working Rust program match the apitrace from surfman more precisely and hopefully repro the failure. (But if anyone else can think of other obvious problems that this could be, then I'm all ears!).

Ha! Got it! If I call eglMakeCurrent with null read/draw surfaces prior to creating the window surface, then I get the black window problem.

If I don't make the context current in ContextDescriptor.from_egl_context then things seem to work--hooray!

To make a generic fix I'll need to either defer poking the GL context for version and the compatibility bit, or move the window surface creation up. @jdm any guidance on what you'd be likely to approve?

jdm commented

Sorry, what exactly does "move the window surface creation" up mean? It might help to see branches demonstrating each option.

@jdm sorry for not being very clear; I mean either I create the window surface before from_egl_context is called and just hold onto it somewhere, or I somehow avoid fetching the context information until after a window surface has been created if one is desired. This is my first expedition into surfman, so I don't know which is more likely to fit into the library. I'll see where I get to and post a PR if I hit upon something that's not too disgusting ;).