Query: PCI Express Runtime Power Management and spurious PME interrupts
onelittlehope opened this issue · 7 comments
With prime-select nvidia
enabled, I was getting intermittent 1s - 1.5s freezes (entire X session pauses including mouse, whilst my ssh session at the time was responsive) on my laptop and at the same time that the issue occurs, I saw that a kernel message was logged into the systemd journal which looked something like:
pcieport 0000:00:01.0: PME: Spurious native interrupt!
I mitigated the freezing issue by commenting out the following lines in the SUSEPrime udev rules:
https://github.com/openSUSE/SUSEPrime/blob/master/90-nvidia-udev-pm-G05.rules#L10-L16
The above lines seems to be enabling runtime power management when an nvidia card is in use.
Is that correct behaviour?
If a choice has been made to use the discrete card then its performance should trump its power usage. Especially if enabling runtime power management comes with the risk of causing breaking behaviour.
Hmm. This should only be relevant in intel mode when using PRIME Render Offload. Then this is needed for NVIDIA power off support (with Turing GPU and later).
If I understood you correctly, then there should be some logic in the the prime-select.sh
script or some manual instruction which performs the task of copying the 90-nvidia-udev-pm-G05.rules
file to either the /etc/udev/rules.d/
or /usr/lib/udev/rules.d/
folders when a user runs prime-select intel
or prime-select intel2
.
Is that correct?
If so, I can raise a bug report against the suse-prime package in openSUSE since currently they copy the 90-nvidia-udev-pm-G05.rules
file into the /usr/lib/udev/rules.d/
folder as part of the RPM package install.
Line 76 in the package spec: (https://build.opensuse.org/package/view_file/openSUSE:Factory/suse-prime/suse-prime.spec?expand=1):
install -m 0644 90-nvidia-udev-pm-G05.rules %{buildroot}/usr/lib/udev/rules.d
I'll request them not to copy the file across which will prevent any freezing issues when someone uses prime-select nvidia
.
Regarding SUSEPrime itself, would it be possible to add some logic to the prime-select.sh
to copy the 90-nvidia-udev-pm-G05.rules
file to either the /etc/udev/rules.d/
or /usr/lib/udev/rules.d/
folders as part of running prime-select intel
or prime-select intel2
or is this an optional step for users ?
Makes sense, but would need more changes. In particular dracut would be needed to run afterwards and the machine a reboot. :-(
I would like to ask you to check first, if a change to
Option "NVreg_DynamicPowerManagement=0x01"
in /etc/modprobe.d/09-nvidia-modprobe-pm-G05.conf already fixes this issue. After changing
this you need to run "mkinitrd" or "dracut -f" and reboot the machine in order to make sure that
the nvidia module gets loaded with this parameter change.
Details on:
https://download.nvidia.com/XFree86/Linux-x86_64/440.82/README/dynamicpowermanagement.html
Are you already using a Turing GPU?
The suse-prime
openSUSE package installs a file called /etc/modprobe.d/09-nvidia-modprobe-pm-G05.conf
which has the following contents:
options nvidia NVreg_DynamicPowerManagement=0x02
I've:
- changed
NVreg_DynamicPowerManagement
to0x1
in the/etc/modprobe.d/09-nvidia-modprobe-pm-G05.conf
file - reverted my changes to
/usr/lib/udev/rules.d/90-nvidia-udev-pm-G05.rules
- ran
mkinitrd
which seems to call/usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force /boot/initrd-5.6.2-1-default 5.6.2-1-default
- and confirmed that
prime-select nvidia
was selected before rebooting the laptop
I can confirm that the freezing issue does not happen any more.
From looking at the dynamicpowermanagement.html link, with NVreg_DynamicPowerManagement=0x02
I think the NVIDIA driver was incorrectly powering down the GPU even when the GPU was driving a display and hence causing a freeze when applications next tried to use the GPU.
As such, I am happy to leave the udev rules in place and use NVreg_DynamicPowerManagement=0x1
with the prime-select nvidia
profile.
It would be good if options nvidia NVreg_DynamicPowerManagement=0x02
could be used with the prime-select intel
profile since this would cause the GPU to be actually powered off instead of going into a low power state.
Are you already using a Turing GPU?
The output of inxi -G
reports the following and so yes I am using a Turing GPU:
Graphics: Device-1: Intel UHD Graphics 630 driver: i915 v: kernel
Device-2: NVIDIA TU106M [GeForce RTX 2060 Mobile] driver: nvidia v: 440.82
Display: server: X.org 1.20.8 driver: modesetting,nvidia unloaded: intel tty: 171x45
Message: Advanced graphics data unavailable in console for root.
Thanks for detailed information and explanation! Very helpful! So indeed we probably have the first user owning a NVIDIA Turing GPU in his laptop and using suse-prime. :-)
Changing the NVreg_DynamicPowerManagement value in prime-select would require a rebuild of init via dracut. Honestly I'm rather reluctant to open such a can of worms in suse-prime.
I'm going to change the default value to 0x01, since this appears to be a safe choice for now and
for everyone in intel and nvidia mode.
Fixed in current git and release 0.7.11.