rust-windowing/glutin

Glutin fails to create a surface when Mesa's EGL 1.5 implementation for GBM is present alongside native (X11) EGL 1.4

janKilika opened this issue · 18 comments

Issue: In my understanding, per presence of Mesa's EGL 1.5 implementation for GBM, symbols for Khr platform functions are actually properly loaded. This in and of itself would not be an issue, but eglGetPlatformDisplay actually fails to error and, from what I've managed to gather through testing, yields a display that is valid as EglDisplay::Ext (note: the EGL_EXT_platform_base extension is present; I'm not sure whether such an interaction would occur if it isn't, nor is it clear what would happen EGL 1.5 and implementations older than EGL 1.4 collide). That, in turn, creates problems down the line, as EGL stably outputs a BadDisplay error when a Khr surface is attempted to be created.

Potential Solution: Though I don't think this circumstance is something that a library can sanely check for before obtaining the display connection, initializing the EGL display does yield the actual version of the active implementation. As such, having a check that downgrades EglDisplay from Khr to Ext provided the yielded version is 1.4 (note that (a) EGL_EXT_platform_base is only available for 1.4 according to the specification and (b) a Khr display wouldn't even make sense for any version that is not 1.5) and EGL_EXT_platform_base is present among client extensions could likely negate the issue.

Glutin will use the corresponding functions according to the display it created, so if you've created the Ext display, it'll use only the Ext surface/context and will never call to Khr/legacy function.

If there's a place where it still happens let me know, but I can ensure that it can not do that, since the extension, which created the display is carried in every place.

What's your glutin version? I know that some older versions were not trying to ensure that the display matches the surface/context functions.

Version originally tested is the latest on crates.io, v0.32.0, however the problem does also occur in the master branch, as the git clone exhibits the same issue, outputting a Khr display in spite of it belonging to EGL 1.4, which can only happen when egl.GetPlatformDisplay (a) is properly loaded (and it is, due to EGL 1.5 being present for GBM) and (b) doesn't actually fail (and failure here would be expected due to disparity between EGL versions for native X11 and GBM, yet somehow it does not occur, which I would suppose points to some bizarre behavior on part of Mesa's EGL implementation).

In my case, downgrading the illegal Khr display to Ext fixed the issue, as the subsequent surface creation proceeds as normal, which means that the egl.GetPlatformDisplay call returns a valid Ext display when it encounters a native display that only supports EGL 1.4.

So your issue is that you have the Khr display, but you need EXT, because Khr doesn't work? This all sounds like a mesa bug, since if Khr display for the platform you're requesting is present (I'd assume you're requesting the GDM platform in display creating as well) there's no way we'd create a Ext, since you have Khr, so why should we?

It's also not clear to me why it's illegal if you have 1.5 EGL version.

Could you post output of eglinfo?

(I'd assume you're requesting the GDM platform in display creating as well)

No, I request the X11 platform through glutin-winit. The raw window handle extracted from the window made by DisplayBuilder is an Xlib window handle.

image

And the version of EGL for X11 present on my device is 1.4, so logically Khr is not valid for it.

image

And EGL version for GBM platform is 1.5, which also means that the relevant functions are loaded.

image

And, this, coupled with display creation somehow succeeding, results in a Khr display being returned when the EGL version is actually 1.4, which makes no sense and errors out when Khr surface creation is attempted.

image
image
image

When I switch it to Ext, all the relevant functions proceed as if the display connection was instantiated through EXT and work correctly.

Why the display creation actually succeeds is beyond me and is probably, as you mentioned, a Mesa bug, however it does still affect any application that tries to use EGL and attempts deducing the version of EGL from whether certain functions are loaded.

Relevant extensions for Ext display creation are present, so Ext would (and does) otherwise work.

image

I've also noticed a similar issue when I was trying to get WGPU's GL backend to work on my machine, but WGPU's HAL avoids stumbling upon the surface creation error itself through some arcane tinkering with the configs and the EGL_NATIVE_RENDERABLE attribute, apparently refusing to create proper presentable surfaces if no config that has it marked EGL_TRUE could be found. I'm not entirely sure what that means, why it's not EGL_TRUE and why does it ultimately not affect anything for me when EXT is actually involved.

Ah, that's because you have multiple vendors, since EGL version is mostly a client thing and not related to gpus... and it probably works because of libglvnd, since you deal with mulitple EGL providers and thus it loaded pointers, but from mesa, but your X11 pointers are from the nvidia lib.

so, it can not work and it's not a mesa bug, but rather libglvnd implication.

Then the only option I see is to ensure that the display you've created is at least 1.5 and if it's not, downgrade to 1.4.

So, please send a patch since you can test reliably, the logic should go into the get_platform_display and checked with eglInitialize and then destroyed.

So, please send a patch since you can test reliably, the logic should go into the get_platform_display and checked with eglInitialize and then destroyed.

I suppose checking beforehand it with eglInitialize in get_platform_display could work, though destroying afterwards sounds a bit like a mess, considering display references are not guaranteed and thus termination can possibly affect other display instances if they are present.

I suppose checking beforehand it with eglInitialize in get_platform_display could work, though destroying afterwards it sounds a bit like a mess, considering display references are not guaranteed and thus termination can possibly affect other display instances if they are present.

we're not borrowing displays, so it all doesn't matter to us, if user has egl along side the glutin, it's not really our problem at this point, from what I can say.

Also, this display can not work, so closing it won't break anything because it's a non-working display anyway.

Also, this display can not work, so closing it won't break anything because it's a non-working display anyway.

It does technically work when downgraded to Ext per tests, though I guess making a supposition that such an invalid Khr display is by default valid for Ext is not a very good idea.

I suppose checking beforehand it with eglInitialize in get_platform_display could work, though destroying afterwards it sounds a bit like a mess, considering display references are not guaranteed and thus termination can possibly affect other display instances if they are present.

we're not borrowing displays, so it all doesn't matter to us, if user has egl along side the glutin, it's not really our problem at this point, from what I can say.

That's fair. Yet, from what I can tell, it could affect display instances born from glutin as well, although I'm not entirely sure how problematic that is since I don't think one would generally try to create more than one display, and for cases like Android where the window can be removed by the OS it's probably not even possible to encounter this issue.

That's fair. Yet, from what I can tell, it could affect display instances born from glutin as well, although I'm not entirely sure how problematic that is since I don't think one would generally try to create more than one display, and for cases like Android where the window can be removed by the OS it's probably not even possible to encounter this issue.

On second thought, how problematic this is likely ultimately depends on whether implementations return the same displays both from eglGetPlatformDisplay and from eglGetPlatformDisplayEXT. They very well could collide since the display returned from eglGetPlatformDisplay does function properly as an EXT display.

It does technically work when downgraded to Ext per tests, though I guess making a supposition that such an invalid Khr display is by default valid for Ext is not a very good idea.

Those are completely different displays, if you close the Khr it doesn't affect Ext, since those are different generally speaking, if they close each other it's a bug in a EGL provider library, since you've passed different attributes.

Anyway, you only do downgrade when you know that Khr display for this platform doesn't work at all, as in, you have 1.4 EGL version, like in your case, the normal 1.5 path won't be affected, since there's no point in doing a downgrade, since the EGL version mandates that you have the Khr display.

That's fair. Yet, from what I can tell, it could affect display instances born from glutin as well, although I'm not entirely sure how problematic that is since I don't think one would generally try to create more than one display, and for cases like Android where the window can be removed by the OS it's probably not even possible to encounter this issue.

You need to create the same displays, if some display doesn't work at all it won't matter, because if external library created it it won't work anyway, you're not destroying all the egl displays, you just destroy the one which matches the configuration you've provided, nothing more.

Those are completely different displays, if you close the Khr it doesn't affect Ext, since those are different generally speaking, if they close each other it's a bug in a EGL provider library, since you've passed different attributes.

By specification they should be, yet from what I've gathered after looking through libglvnd source code it seems that eglGetPlatformDisplay and eglGetPlatformDisplayEXT fetch data from the same function from vendor's getPlatformDisplay field. Without any actual differentiation. The relevant function does have the funcName parameter, yet it does not pass it to the vendor at all and only uses it for error reporting.

Anyway, you only do downgrade when you know that Khr display for this platform doesn't work at all, as in, you have 1.4 EGL version, like in your case, the normal 1.5 path won't be affected, since there's no point in doing a downgrade, since the EGL version mandates that you have the Khr display.

Yes, per extraction of the version from eglInitialize, a downgrade would only be required if the version and expected variant are incongruent.

You need to create the same displays, if some display doesn't work at all it won't matter, because if external library created it it won't work anyway, you're not destroying all the egl displays, you just destroy the one which matches the configuration you've provided, nothing more.

Display::new for EGL only accepts a RawDisplayHandle, so if somehow the same handle is managed to be passed to glutin more than once, necessarily terminating in get_platform_display could cause other instances to be invalidated.

Though, attributes could differ for X11 based on whether generated value for PLATFORM_X11_SCREEN_KHR differs from the value of PLATFORM_X11_SCREEN_EXT.

Display::new for EGL only accepts a RawDisplayHandle, so if somehow the same handle is managed to be passed to glutin more than once, necessarily terminating in get_platform_display could cause other instances to be invalidated.

It doesn't really matter since this display assumes ownership to some degree, and invalidating invalid display won't do anything, I'm fine with closing this issue and doing nothing, but keep display around may crash on exit once the library runs its on_exit hooks, like nvidia likes to do to restore hardware.

it's just libraries who work with other libraries who create the display should have a way to create their display from the foreign object, and in this case it must ensure to not close any displays, but it also means that there's no point in checking anything when you create things from the foreign object, since it's not your job. Glutin doesn't work
with foreign objects yet, so it all is irrelevant to us.

Though, attributes could differ for X11 based on whether generated value for PLATFORM_X11_SCREEN_KHR differs from the value of PLATFORM_X11_SCREEN_EXT.

Per specification (a) (b), it does not. Neither do differ enums for platform specification, both for Wayland and for X11. So ultimately, per the machinations of libglvnd, the underlying display could be same as attributes would not differ.

It doesn't really matter since this display assumes ownership to some degree

If Display is assuming ownership over the display and is unsafely passing the responsibility of regulating that onto the user, I feel like it probably should be stated somewhere in documentation as a "Safety" remark. Plus, checking for the display reference extension on drop and not terminating if it isn't present doesn't make a ton sense if Display is presumed to otherwise be owned and that nothing external is interfering. So I'm not entirely sure how safe it would be to terminate and whether that is necessary considering libglvnd provides the two functions as aliases of a single one. If the display can be presumed to be owned and that no other glutin displays exist, I suppose initializing and terminating in get_platform_display could be fine, but otherwise it would probably be wiser to check for that in initialize_display. If eglGetPlatformDisplay ultimately aliases to eglGetPlatformDisplayEXT and somehow parameters that are invalid for EXT are passed, I would presume it would probably error out anyway and proceed to get_platform_display_ext, so it I don't think it ultimately matters to check for it in get_platform_display.

Could you try checking if the display version is 1.4 and if Khr pointer to get display matches the Ext pointer and then just straight change the variant to Ext?

Maybe it'll just work, since I don't see any issue with this logic. You don't need to destroy anything that way, since you just cast the display to Ext one, since it's effectively Ext display.

Could you try checking if the display version is 1.4 and if Khr pointer to get display matches the Ext pointer and then just straight change the variant to Ext?

Maybe it'll just work, since I don't see any issue with this logic. You don't need to destroy anything that way, since you just cast the display to Ext one, since it's effectively Ext display.

Indeed one could. I'm currently going to re-test this again and then I'm going to submit a patch.