ublue-os/hwe

Switching to the discrete GPU under Wayland causes black screen on internal display

alexispurslane opened this issue · 14 comments

I tried to use supergfxctl on my fresh install of silverblue-nvidia to switch to NvidiaNoModeset instead of Hybrid, because my battery isn't a concern and I'd rather get the substantial performance boost of not having to copy frames around in video memory. Upon rebooting, everything does seem to be exclusively running on my dGPU, but my laptop's internal display was completely black, and systemctl status supergfxd.service is reporting several errors like "Did not have dGPU handle" and "Could not find dGPU." This makes supergfxctl-gex not work, but I tried to use the shell command to switch back to Hybrid graphics, but upon logout/login again (or reboot) it's still on NvidiaNoModeset, completely ignoring my command, and still complaining about not being able to find the dGPU.

Is there a way to:

  1. Make my dGPU visible to supergfxctl again after switching to full nvidia
  2. Make my laptop's internal display work again?

I switched to silverblue-nvidia after running into nvidia driver trouble on tumbleweed, but this is far worse than anything I experienced there.

I edited /etc/supergfxctl.conf to put the default mode back to Hybrid, and rebooted into that, and now only my internal display works, and my external display doesn't, and the nvidia drivers are just completely not loaded at all, because supergfxctl still can't find my dgpu anymore, and trying to modprobe nvidia produces an error saying that the kernel module "off" cannot be found.

So it seems like trying to switch to discrete graphics actually flips some kind of permanent switch that makes it permanently unable to talk to my nvidia gpu at all, since Hybrid graphics worked before.

Last time this happened, I tried resetting my image, rebasing it, etc, and none of that worked. Only a complete reinstall put things back to the status quo (that being, Hybrid mode with both displays working and "run on dgpu" working. Which is better than this clusterf--- but less than ideal)

I really can't for the life of me figure out what's wrong, and it's very frustrating that an image that's supposed to work well for nvidia does this :(

Booting up into integrated graphics mode produces the same outcome as Hybrid graphics. In fact, even booting back into NvidiaNoModeset actually produces the exact same symptoms (nvidia drivers not loaded, only internal display working, etc). However, I did notice something interesting. /etc/modprobe.d/supergfxd.conf contains these lines:

# Automatically generated by supergfxd
blacklist nouveau
alias nouveau off
blacklist nvidia_drm
blacklist nvidia_uvm
blacklist nvidia_modeset
blacklist nvidia
alias nvidia off

options nvidia-drm modeset=1

Removing the dubious lines and switching to NvidiaNoModeset again and then rebooting brings me back to the original state: kernel modules loaded, nvidia-smi reporting the gpu, everything running on the dgpu... but my internal display is not working.

some relevant kernel logs
Screenshot from 2024-03-02 12-53-46
Screenshot from 2024-03-02 12-54-19

From the above it looks like what's happening is GNOME is trying to allocate the framebuffer for the internal screen using the iGPU, which doesn't work, since the MUX is switched to the dGPU, and so it gives up. Is there some way to tell GNOME which one is the right one to use?

Switching back into Hybrid mode (which requires a reboot) reveals that now, with the fixed config, Hybrid mode displays the same symptoms as NvidiaNoModeset mode: everything seems to go to the right gpus and so on, but my internal display is still black and supergfx can't find my dgpu.

Switching to Xorg with NvidiaNoModeset on makes all my displays work and so on, but supergfx still can't detect my GPU, and the first few times I booted before I switched anything at least Hybrid mode showed both my screens under Wayland, that only broke after I tried to switch, so something deeper is going on here.

You might want to wipe out the file supergfx creates in etc and start over fresh, it's very easy to get that config wrong.

This is one of those tools we'd replace if a better alternative was available.

You might want to wipe out the file supergfx creates in etc and start over fresh, it's very easy to get that config wrong.

Given that I already removed most of that file, I don't have a lot of hope that'll help, but I'll give it a shot for sure, thank you 😓

@KyleGospo Alright, I tried just deleting the whole config, and it didn't make a difference.

Also, even when i choose xorg, GDM still doesn't display on the internal display, which is very annoying.

I fixed that by doing WaylandEnable=false, but the lingering problem is that supergfx still can't detect my dgpu, even under Xorg.

Summary so far

In the beginning, Hybrid mode displays on both screens, allows supergfxctl to talk to my GPU, and allows me to run things on each GPU as expected.

However, as soon as I switch modes at all:

  1. Switching the mode for the first time makes supergfxctl persistently unable to find my dgpu even when I switch back to Hybrid.
  2. Switching the mode to NvidiaNoModeset (in order to run everything on my dgpu) makes my internal screen black under Wayland, both in that mode and persistently even in the Hybrid mode thereafter. Switching to Integrated makes only my internal display work, no external displays, as expected, however.
  3. Both screens still work in Hybrid and NvidiaNoModeset modes under Xorg, unlike under Wayland, nut supergfxctl is still borked.

The fact that Hybrid mode did work under Wayland initially, but then stopped working, indicates to me there is a deeper issue than wayland itself.

Well, this seems to have sort of resolved itself, at least in Hybrid mode Wayland it works again.