Glutin fails to create a surface when Mesa's EGL 1.5 implementation for GBM is present alongside native (X11) EGL 1.4
janKilika opened this issue · 18 comments
Issue: In my understanding, per presence of Mesa's EGL 1.5 implementation for GBM, symbols for Khr
platform functions are actually properly loaded. This in and of itself would not be an issue, but eglGetPlatformDisplay
actually fails to error and, from what I've managed to gather through testing, yields a display that is valid as EglDisplay::Ext
(note: the EGL_EXT_platform_base
extension is present; I'm not sure whether such an interaction would occur if it isn't, nor is it clear what would happen EGL 1.5 and implementations older than EGL 1.4 collide). That, in turn, creates problems down the line, as EGL stably outputs a BadDisplay
error when a Khr
surface is attempted to be created.
Potential Solution: Though I don't think this circumstance is something that a library can sanely check for before obtaining the display connection, initializing the EGL display does yield the actual version of the active implementation. As such, having a check that downgrades EglDisplay
from Khr
to Ext
provided the yielded version is 1.4 (note that (a) EGL_EXT_platform_base
is only available for 1.4 according to the specification and (b) a Khr
display wouldn't even make sense for any version that is not 1.5) and EGL_EXT_platform_base
is present among client extensions could likely negate the issue.
Glutin will use the corresponding functions according to the display it created, so if you've created the Ext
display, it'll use only the Ext
surface/context and will never call to Khr
/legacy function.
If there's a place where it still happens let me know, but I can ensure that it can not do that, since the extension, which created the display is carried in every place.
What's your glutin version? I know that some older versions were not trying to ensure that the display matches the surface/context functions.
Version originally tested is the latest on crates.io, v0.32.0, however the problem does also occur in the master branch, as the git clone exhibits the same issue, outputting a Khr
display in spite of it belonging to EGL 1.4, which can only happen when egl.GetPlatformDisplay
(a) is properly loaded (and it is, due to EGL 1.5 being present for GBM) and (b) doesn't actually fail (and failure here would be expected due to disparity between EGL versions for native X11 and GBM, yet somehow it does not occur, which I would suppose points to some bizarre behavior on part of Mesa's EGL implementation).
In my case, downgrading the illegal Khr
display to Ext
fixed the issue, as the subsequent surface creation proceeds as normal, which means that the egl.GetPlatformDisplay
call returns a valid Ext
display when it encounters a native display that only supports EGL 1.4.
So your issue is that you have the Khr display, but you need EXT, because Khr doesn't work? This all sounds like a mesa bug, since if Khr display for the platform you're requesting is present (I'd assume you're requesting the GDM platform in display creating as well) there's no way we'd create a Ext
, since you have Khr
, so why should we?
It's also not clear to me why it's illegal if you have 1.5 EGL version.
Could you post output of eglinfo
?
(I'd assume you're requesting the GDM platform in display creating as well)
No, I request the X11 platform through glutin-winit. The raw window handle extracted from the window made by DisplayBuilder is an Xlib window handle.
And the version of EGL for X11 present on my device is 1.4, so logically Khr
is not valid for it.
And EGL version for GBM platform is 1.5, which also means that the relevant functions are loaded.
And, this, coupled with display creation somehow succeeding, results in a Khr
display being returned when the EGL version is actually 1.4, which makes no sense and errors out when Khr
surface creation is attempted.
When I switch it to Ext
, all the relevant functions proceed as if the display connection was instantiated through EXT and work correctly.
Why the display creation actually succeeds is beyond me and is probably, as you mentioned, a Mesa bug, however it does still affect any application that tries to use EGL and attempts deducing the version of EGL from whether certain functions are loaded.
Relevant extensions for Ext
display creation are present, so Ext
would (and does) otherwise work.
I've also noticed a similar issue when I was trying to get WGPU's GL backend to work on my machine, but WGPU's HAL avoids stumbling upon the surface creation error itself through some arcane tinkering with the configs and the EGL_NATIVE_RENDERABLE
attribute, apparently refusing to create proper presentable surfaces if no config that has it marked EGL_TRUE
could be found. I'm not entirely sure what that means, why it's not EGL_TRUE
and why does it ultimately not affect anything for me when EXT is actually involved.
Ah, that's because you have multiple vendors, since EGL version is mostly a client thing and not related to gpus... and it probably works because of libglvnd
, since you deal with mulitple EGL providers and thus it loaded pointers, but from mesa, but your X11 pointers are from the nvidia lib.
so, it can not work and it's not a mesa bug, but rather libglvnd implication.
Then the only option I see is to ensure that the display you've created is at least 1.5 and if it's not, downgrade to 1.4
.
So, please send a patch since you can test reliably, the logic should go into the get_platform_display
and checked with eglInitialize
and then destroyed.
So, please send a patch since you can test reliably, the logic should go into the
get_platform_display
and checked witheglInitialize
and then destroyed.
I suppose checking beforehand it with eglInitialize
in get_platform_display
could work, though destroying afterwards sounds a bit like a mess, considering display references are not guaranteed and thus termination can possibly affect other display instances if they are present.
I suppose checking beforehand it with eglInitialize in get_platform_display could work, though destroying afterwards it sounds a bit like a mess, considering display references are not guaranteed and thus termination can possibly affect other display instances if they are present.
we're not borrowing displays, so it all doesn't matter to us, if user has egl along side the glutin, it's not really our problem at this point, from what I can say.
Also, this display can not work, so closing it won't break anything because it's a non-working display anyway.
Also, this display can not work, so closing it won't break anything because it's a non-working display anyway.
It does technically work when downgraded to Ext
per tests, though I guess making a supposition that such an invalid Khr
display is by default valid for Ext
is not a very good idea.
I suppose checking beforehand it with eglInitialize in get_platform_display could work, though destroying afterwards it sounds a bit like a mess, considering display references are not guaranteed and thus termination can possibly affect other display instances if they are present.
we're not borrowing displays, so it all doesn't matter to us, if user has egl along side the glutin, it's not really our problem at this point, from what I can say.
That's fair. Yet, from what I can tell, it could affect display instances born from glutin as well, although I'm not entirely sure how problematic that is since I don't think one would generally try to create more than one display, and for cases like Android where the window can be removed by the OS it's probably not even possible to encounter this issue.
That's fair. Yet, from what I can tell, it could affect display instances born from glutin as well, although I'm not entirely sure how problematic that is since I don't think one would generally try to create more than one display, and for cases like Android where the window can be removed by the OS it's probably not even possible to encounter this issue.
On second thought, how problematic this is likely ultimately depends on whether implementations return the same displays both from eglGetPlatformDisplay
and from eglGetPlatformDisplayEXT
. They very well could collide since the display returned from eglGetPlatformDisplay
does function properly as an EXT display.
It does technically work when downgraded to Ext per tests, though I guess making a supposition that such an invalid Khr display is by default valid for Ext is not a very good idea.
Those are completely different displays, if you close the Khr
it doesn't affect Ext
, since those are different generally speaking, if they close each other it's a bug in a EGL provider library, since you've passed different attributes.
Anyway, you only do downgrade when you know that Khr display for this platform doesn't work at all, as in, you have 1.4 EGL version, like in your case, the normal 1.5 path won't be affected, since there's no point in doing a downgrade, since the EGL version mandates that you have the Khr display.
That's fair. Yet, from what I can tell, it could affect display instances born from glutin as well, although I'm not entirely sure how problematic that is since I don't think one would generally try to create more than one display, and for cases like Android where the window can be removed by the OS it's probably not even possible to encounter this issue.
You need to create the same displays, if some display doesn't work at all it won't matter, because if external library created it it won't work anyway, you're not destroying all the egl displays, you just destroy the one which matches the configuration you've provided, nothing more.
Those are completely different displays, if you close the
Khr
it doesn't affectExt
, since those are different generally speaking, if they close each other it's a bug in a EGL provider library, since you've passed different attributes.
By specification they should be, yet from what I've gathered after looking through libglvnd
source code it seems that eglGetPlatformDisplay
and eglGetPlatformDisplayEXT
fetch data from the same function from vendor's getPlatformDisplay
field. Without any actual differentiation. The relevant function does have the funcName
parameter, yet it does not pass it to the vendor at all and only uses it for error reporting.
Anyway, you only do downgrade when you know that Khr display for this platform doesn't work at all, as in, you have 1.4 EGL version, like in your case, the normal 1.5 path won't be affected, since there's no point in doing a downgrade, since the EGL version mandates that you have the Khr display.
Yes, per extraction of the version from eglInitialize
, a downgrade would only be required if the version and expected variant are incongruent.
You need to create the same displays, if some display doesn't work at all it won't matter, because if external library created it it won't work anyway, you're not destroying all the egl displays, you just destroy the one which matches the configuration you've provided, nothing more.
Display::new
for EGL only accepts a RawDisplayHandle
, so if somehow the same handle is managed to be passed to glutin more than once, necessarily terminating in get_platform_display
could cause other instances to be invalidated.
Though, attributes could differ for X11 based on whether generated value for PLATFORM_X11_SCREEN_KHR
differs from the value of PLATFORM_X11_SCREEN_EXT
.
Display::new for EGL only accepts a RawDisplayHandle, so if somehow the same handle is managed to be passed to glutin more than once, necessarily terminating in get_platform_display could cause other instances to be invalidated.
It doesn't really matter since this display assumes ownership to some degree, and invalidating invalid display won't do anything, I'm fine with closing this issue and doing nothing, but keep display around may crash on exit once the library runs its on_exit hooks, like nvidia likes to do to restore hardware.
it's just libraries who work with other libraries who create the display should have a way to create their display from the foreign object, and in this case it must ensure to not close any displays, but it also means that there's no point in checking anything when you create things from the foreign object, since it's not your job. Glutin doesn't work
with foreign objects yet, so it all is irrelevant to us.
Though, attributes could differ for X11 based on whether generated value for
PLATFORM_X11_SCREEN_KHR
differs from the value ofPLATFORM_X11_SCREEN_EXT
.
Per specification (a) (b), it does not. Neither do differ enums for platform specification, both for Wayland and for X11. So ultimately, per the machinations of libglvnd
, the underlying display could be same as attributes would not differ.
It doesn't really matter since this display assumes ownership to some degree
If Display is assuming ownership over the display and is unsafely passing the responsibility of regulating that onto the user, I feel like it probably should be stated somewhere in documentation as a "Safety" remark. Plus, checking for the display reference extension on drop and not terminating if it isn't present doesn't make a ton sense if Display
is presumed to otherwise be owned and that nothing external is interfering. So I'm not entirely sure how safe it would be to terminate and whether that is necessary considering libglvnd
provides the two functions as aliases of a single one. If the display can be presumed to be owned and that no other glutin displays exist, I suppose initializing and terminating in get_platform_display
could be fine, but otherwise it would probably be wiser to check for that in initialize_display
. If eglGetPlatformDisplay
ultimately aliases to eglGetPlatformDisplayEXT
and somehow parameters that are invalid for EXT are passed, I would presume it would probably error out anyway and proceed to get_platform_display_ext
, so it I don't think it ultimately matters to check for it in get_platform_display
.
Could you try checking if the display version is 1.4
and if Khr
pointer to get display matches the Ext
pointer and then just straight change the variant to Ext
?
Maybe it'll just work, since I don't see any issue with this logic. You don't need to destroy anything that way, since you just cast the display to Ext
one, since it's effectively Ext
display.
Could you try checking if the display version is
1.4
and ifKhr
pointer to get display matches theExt
pointer and then just straight change the variant toExt
?Maybe it'll just work, since I don't see any issue with this logic. You don't need to destroy anything that way, since you just cast the display to
Ext
one, since it's effectivelyExt
display.
Indeed one could. I'm currently going to re-test this again and then I'm going to submit a patch.