ilayna/Single-GPU-passthrough-amd-nvidia

AMD GPU issue possibly? Prepare Hook hangs

seffyroff opened this issue · 7 comments

Hi there! Thanks for your work on this. I had it all working great on my Nvidia GPU, which unfortunately broke down.

I replaced it with an AMD GPU, removing nvidia drivers, installing AMD drivers, removing your script hooks and configs, then re-running your script and reconfiguring the VM with the new device. IOMMU groups look fine.

The issue is that the prepare hook script hangs, like so:

sudo /etc/libvirt/hooks/qemu.d/win10/prepare/begin/start.sh
[sudo] password for seffyroff:
+ source /etc/libvirt/hooks/kvm.conf
++ VIRSH_GPU_VIDEO=pci_0000_0b_00_0
++ VIRSH_GPU_AUDIO=pci_0000_0b_00_1
+ systemctl stop display-manager
+ echo 0
+ echo 0
+ echo efi-framebuffer.0
/etc/libvirt/hooks/qemu.d/win10/prepare/begin/start.sh: line 16: echo: write error: No such device
+ sleep 5
+ modprobe -r amdgpu
/etc/libvirt/hooks/qemu.d/win10/prepare/begin/start.sh: line 22:  9972 Killed                  modprobe -r amdgpu
+ virsh nodedev-detach pci_0000_0b_00_0

I seem stuck at this point with no option but to hard reset the host. I tried killing the prepare script then running the release script, but the release script gets similarly stuck. soft-rebooting hangs indefinitely.

This seems to be the old hooks, replace them with the new hooks and try again, you might want to run the whole rep again just to make sure

Thanks for your reply. I have started cleaning out any remaining parts of the script before cloning fresh, and FYI I'm having trouble with the uninstall script.

I'm using Manjaro - the pacman line you have for Manjaro errors out:

sudo bash ./src/uninstall.sh                                                                                                             ✔ 
About to remove /etc/lbivirt/hooks/qemu, /bin/vfio-startup.sh,  /bin/vfio-teardown.sh and delete virtualization packages !
Do you wish to uninstall anyway? y/n y
rm: cannot remove '/etc/libvirt/hooks/qemu': No such file or directory
rm: cannot remove '/bin/vfio-startup.sh': No such file or directory
rm: cannot remove '/bin/vfio-teardown.sh': No such file or directory
error: invalid option '-y'
Uninstalled !

I checked and manually ran the line, committing the '-y':

sudo pacman -R virt-manager qemu vde2 ebtables iptables-nft nftables dnsmasq bridge-utils ovmf
error: target not found: qemu
error: target not found: ebtables
error: target not found: ovmf

Removing those 3 missing packages from the command then fails on broken dependencies:

sudo pacman -R virt-manager vde2 iptables-nft nftables dnsmasq bridge-utils                                                            1 ✘ 
checking dependencies...
error: failed to prepare transaction (could not satisfy dependencies)
:: removing bridge-utils breaks dependency 'bridge-utils' required by docker
:: removing iptables-nft breaks dependency 'iptables' required by iproute2
:: removing vde2 breaks dependency 'vde2' required by qemu-system-x86
:: removing iptables-nft breaks dependency 'iptables' required by systemd

So at this point I guess I should just carry on starting with a fresh clone?
What about changes that were made to grub? Should I manually revert those too?

FYI there's a problem with the install_hooks.sh script. Running it as per the README.md results in this:

 ~/Documents/Single-GPU-passthrough-amd-nvidia | main # sudo bash ./src/install_hooks.sh                                                                                                        
cp: cannot stat 'systemd-no-sleep/libvirt-nosleep@.service': No such file or directory
cp: cannot stat 'hooks/vfio-startup.sh': No such file or directory
cp: cannot stat 'hooks/vfio-teardown.sh': No such file or directory
cp: cannot stat 'hooks/qemu': No such file or directory
chmod: cannot access '/bin/vfio-startup.sh': No such file or directory
chmod: cannot access '/bin/vfio-teardown.sh': No such file or directory
chmod: cannot access '/etc/libvirt/hooks/qemu': No such file or directory

if I cd into the src dir and run the script there instead, it silently completes, no output to indicate success.

yes, if you cd src/ that's fine, it should be like that.

FYI there's a problem with the install_hooks.sh script. Running it as per the README.md results in this:

 ~/Documents/Single-GPU-passthrough-amd-nvidia | main # sudo bash ./src/install_hooks.sh                                                                                                        
cp: cannot stat 'systemd-no-sleep/libvirt-nosleep@.service': No such file or directory
cp: cannot stat 'hooks/vfio-startup.sh': No such file or directory
cp: cannot stat 'hooks/vfio-teardown.sh': No such file or directory
cp: cannot stat 'hooks/qemu': No such file or directory
chmod: cannot access '/bin/vfio-startup.sh': No such file or directory
chmod: cannot access '/bin/vfio-teardown.sh': No such file or directory
chmod: cannot access '/etc/libvirt/hooks/qemu': No such file or directory

if I cd into the src dir and run the script there instead, it silently completes, no output to indicate success.

Thats intentional, its looking for the hooks folder in the current directory, Ill make it clearer in the readme in a second. as for the uninstall not working, that's because it is to uninstall the new version and is not compatible with uninstalling the old version. Which you had installed.
Did you ran install.sh first ? What's your current situation at ? So I could help you better.

Fixed your complaint here, b9104c5 Leaving this open for a bit to help you with your current situation.

Hi @wabulu thanks for your reply.

What's your current situation at ? So I could help you better.

Well, I have a fresh clone of the repo, ran the scripts and checked that they have done what they want to do. I've configured my VM, and am now looking at some errors in /var/log/libvirt/custom_hooks.log:

09/14/2022 09:46:29 : Beginning of Startup!
1153 plasmashell
09/14/2022 09:46:29 : Display Manager is KDE, running KDE clause!
09/14/2022 09:46:29 : Display Manager = display-manager
09/14/2022 09:46:29 : Unbinding Console 1
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M] [1002:73df] (rev c0)
09/14/2022 09:46:29 : System has an AMD GPU
/bin/vfio-startup.sh: line 140: echo: write error: No such device
modprobe: FATAL: Module drm_kms_helper is builtin.
/bin/vfio-startup.sh: line 149: 37654 Killed                  modprobe -r amdgpu
modprobe: FATAL: Module drm is builtin.
09/14/2022 09:46:29 : AMD GPU Drivers Unloaded
09/14/2022 09:46:29 : End of Startup!

Was solved by retrying on a fresh install, was caused by not uninstalling the old version first. Conversation took place in the discord