Arc-Compute/LibVF.IO

Hi, i can't understand what is wrong with this. Is gtx 1650 not supported? It is Turing architecture (GPU TU116).I have attached the output below!

pyhtonlovertytt opened this issue · 16 comments

root@TIGPC:/home/tiguser/Downloads# arcd create nvidia-mdev.yaml WIN1021h1.iso 80
bash: line 0: echo: write error: Input/output error
Formatting '/root/.local/libvf.io/live/3ce0ad37-f3bf-4559-98c9-1327b0f92de6', fmt=qcow2 size=85899345920 cluster_size=65536 lazy_refcounts=off refcount_bits=16
root@TIGPC:/home/tiguser/Downloads# /bin/qemu-system-x86_64 -D /root/.local/libvf.io/logs/qemu/3ce0ad37-f3bf-4559-98c9-1327b0f92de6-session.txt -no-hpet -uuid 3ce0ad37-f3bf-4559-98c9-1327b0f92de6 -machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu host,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaveopt=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=1234567890ab,kvm=off,topoext=on -rtc clock=host,base=localtime -m 8192 -smp cores=4,threads=1,sockets=1 -hda /root/.local/libvf.io/live/3ce0ad37-f3bf-4559-98c9-1327b0f92de6 --enable-kvm --soundhw all -device rtl8139,netdev=net0 -netdev user,id=net0,hostfwd=tcp::2222-:22 -qmp unix:/tmp/sockets/3ce0ad37-f3bf-4559-98c9-1327b0f92de6/main.sock,server,nowait -qmp unix:/tmp/sockets/3ce0ad37-f3bf-4559-98c9-1327b0f92de6/master.sock,server,nowait -cdrom WIN1021h1.iso -set device.hostdev0.x-pci-device-id=6960
qemu-system-x86_64: -set device.hostdev0.x-pci-device-id=6960: there is no device "hostdev0" defined
[2021-11-28 11:02:24] - INFO: Connecting to the socket
asyncfutures.nim(389) read
Error: unhandled exception: No such file or directory [OSError]

Can you provide me with the following information:

  • GPU VRAM total (gigs of memory on your card)
  • CPU vendor and model number
  • Motherboard model number
  • Host operating system used (Arch or Ubuntu)
  • Did you follow the guide document or the video during installation?

Can you also run the following script and paste the output here for me?

#!/bin/bash
for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

Not sure if it's helpful, but I also have a 1650 Super:

  • 4 GB DDR6
  • Intel i9 10940X
  • EVGA X299 FTW K
  • Fedora 24
  • Yes, both guides
IOMMU Group 40:
	17:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] [10de:2187] (rev a1)
	17:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1)
	17:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
	17:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1)

I modified the install-utuntu.sh script for fedora, using the fedora versions of packages. Everything built although I'm not certain the nvidia driver installed. I get the same error message as @pyhtonlovertytt when running arcd create.

PS: One potential addition to the guides is a reminder to enable VFIO in the bios, since a lot of motherboards have that disabled.

@AugustNagro Someone on our team is working on adding support for kernel 5.14 for Nvidia users which I believe is the current kernel version used in Fedora 25. I'm not sure about Fedora 24 however.

If you're using an Nvidia GPU the nv merged driver from the guide is definitely a dependancy.

If you want to contribute a pull request with your modified install-fedora.sh script then I'll try my best to test/validate it on my end then merge it to our repo if it works well. :)

@arthurrasmusson yes Fedora 24 is on 5.14.

If you're using an Nvidia GPU the nv merged driver from the guide is definitely a dependancy.

Sounds good. I haven't yet been able to unload the current nvidia drivers to install the new one. But I should figure that out soon.

Does it matter that I'm currently on the proprietary driver instead of nouveau? For a long time nouveau didn't support the 1650, but it looks like it does now.

If you want to contribute a pull request with your modified install-fedora.sh script then I'll try my best to test/validate it on my end then merge it to our repo if it works well. :)

Roger that, when I can get something working I'll make a draft PR.

I'm having a similar issue on a fresh installation of Kubuntu 20.04.3

[roger@roger-kubuntu libvf.io]$ arcd create /home/roger/nvidia-mdev.yaml /home/roger/windows10.iso 200
Formatting '/home/roger/.local/libvf.io/live/f7344aeb-3a82-474a-9fde-1ca41572f8c5', fmt=qcow2 size=214748364800 cluster_size=65536 lazy_refcounts=off refcount_bits=16

[roger@roger-kubuntu libvf.io]$ /bin/qemu-system-x86_64 -D /home/roger/.local/libvf.io/logs/qemu/f7344aeb-3a82-474a-9fde-1ca41572f8c5-session.txt -no-hpet -uuid f7344aeb-3a82-474a-9fde-1ca41572f8c5 -machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu host,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaveopt=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=1234567890ab,kvm=off,topoext=on -rtc clock=host,base=localtime -m 8192 -smp cores=4,threads=1,sockets=1 -hda /home/roger/.local/libvf.io/live/f7344aeb-3a82-474a-9fde-1ca41572f8c5 --enable-kvm -device rtl8139,netdev=net0 -netdev user,id=net0,hostfwd=tcp::2222-:22 -qmp unix:/tmp/sockets/f7344aeb-3a82-474a-9fde-1ca41572f8c5/main.sock,server,nowait -qmp unix:/tmp/sockets/f7344aeb-3a82-474a-9fde-1ca41572f8c5/master.sock,server,nowait -cdrom /home/roger/windows10.iso -set device.hostdev0.x-pci-device-id=6960
qemu-system-x86_64: -set device.hostdev0.x-pci-device-id=6960: there is no device "hostdev0" defined
[2021-11-30 17:20:00] - INFO: Connecting to the socket
asyncfutures.nim(389)    read
Error: unhandled exception: No such file or directory [OSError]

GPU VRAM: 4GB
GPU: Nvidia Geforce GTX 980M
CPU: Intel Core i7-6700HQ
MB chipset: Intel HM170
Host OS: Kubuntu 20.04.3
I've been following the document, not the video.

Output from the script:

IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1910] (rev 07)
IOMMU Group 10 03:00.0 Network controller [0280]: Intel Corporation Wireless 7265 [8086:095a] (rev 61)
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 980M] [10de:13d7] (rev a1)
IOMMU Group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
IOMMU Group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06)
IOMMU Group 3 00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
IOMMU Group 3 00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
IOMMU Group 4 00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
IOMMU Group 5 00:17.0 SATA controller [0106]: Intel Corporation HM170/QM170 Chipset SATA Controller [AHCI Mode] [8086:a103] (rev 31)
IOMMU Group 6 00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
IOMMU Group 7 00:1c.5 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #6 [8086:a115] (rev f1)
IOMMU Group 8 00:1f.0 ISA bridge [0601]: Intel Corporation HM170 Chipset LPC/eSPI Controller [8086:a14e] (rev 31)
IOMMU Group 8 00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
IOMMU Group 8 00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
IOMMU Group 8 00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
IOMMU Group 9 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader [10ec:5287] (rev 01)
IOMMU Group 9 02:00.1 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 12)

I didn't change any settings in nvidia-mdev.yaml

@AugustNagro

Does it matter that I'm currently on the proprietary driver instead of nouveau?

On Nvidia hardware the errors you're seeing are consistent with a setup which lacks the mdev driver.

You'll want to repeat the steps detailed in the setup guide on a supported operating system making sure you install the driver detailed in section 5.1.4 Host Mdev GPU Driver. Please note that the optional nv merged driver linked in section 5.1.4 is not the same as the Nvidia consumer or server driver that ship with your OS package manager it is an entirely different driver package with differing functionality.

@rogerskeie Try following the guide on vanilla Ubuntu 20.04 Desktop and ensure if you are using an Nvidia consumer card that you include the optional merged driver package as detailed in the installation docs.

If you are still having troubles on vanilla Ubuntu 20.04 Desktop using the latest commits please dump your log file for me using the following command:
./scripts/generate-debug-info.sh

This will generate a ./logs/debug.log - you can send that to me here or through Discord and I'll do my best to lend you a hand with your setup! :)

@arthurrasmusson I got the same error again with a vanilla Ubuntu (and found Gnome a horrible user experience compared to KDE and Windows 10 by the way, even after installing Dash to Panel).

debug.log

I did get these error messages when running the script to generate the debug log by the way, so I tried again and got the same error. When I checked the log file, I saw that it had written stuff there though.

pcilib: sysfs_read_vpd: read failed: Input/output error
pcilib: sysfs_read_vpd: read failed: Input/output error

I did have the Nvidia merged driver package in the optional folder of course, like I did when I tried it on Kubuntu.

Before I tried again with vanilla Ubuntu I did have a go trying to install it in Manjaro KDE using the arch script, and eventually (after installing some missing things including yay and the kernel header files) got it to the point that it started installing the Nvidia driver. At the end of the install process it was saying it had failed though, and when I looked at the installer log I saw that there were errors when compiling (unfortunately I didn't make a copy of that log file, so it got overwritten when I installed Ubuntu).

I presume this has to do with the latest Manjaro having kernel 5.13, since I saw mentioned somewhere that libvf.io currently doesn't support kernels newer than 5.11

Maybe I'll try installing Manjaro again after work tomorrow and try downgrading the kernel to 5.11

PS: I suggest adding a check in the arch install script to see if yay is installed, and if it isn't, install it with pacman, since a lot of that script depends on things installed by yay. It isn't installed by default in Manjaro at least.

By the way, before I spend too much time trying to get this to work, I have a question.

I saw in the video where someone was demonstrating libvf.io in action they were playing in a window instead of a full screen. Was that just to be able to demonstrate Tuxracer and glxgears running alongside a game on the guest OS, or is that a technical limitation meaning it's not possible run it in 1080p fullscreen?

Thanks @arthurrasmusson and I appreciate your patience :)

I switched to fedora kernel 5.11 and the modified nvidia driver got closer but had compile errors like

ERROR: modpost: "nvUvmInterfaceGetExternalAllocPtes [..] undefined!

which aborted the installation.

@rogerskeie Try using lspci -vnn to see what kernel modules you have available for your Nvidia GPU. For me, I had nvidia, nvidia_drm, nvidia_vgpu_vfio, and nvidiafb. Adding those under MODULES=() in mkinitcpio.conf, regenerating my kernel image, and rebooting seems to have fixed the there is no device "hostdev0" defined error for me.

@rogerskeie We have a patch file in patches called twelve.patch which supports kernel 5.12.
Twelve.patch author is here:
https://github.com/rupansh/vgpu_unlock_5.12/blob/master/twelve.patch
We apply that automatically if the script thinks you're newer than 5.11. One guy on our team is just focusing full time on getting 5.14 working as there have been more breaking changes since 5.12 and by popular demand (Fedora 35) we're looking into support for it's shipped kernel (5.14 I believe).

I saw in the video where someone was demonstrating libvf.io in action they were playing in a window instead of a full screen. Was that just to be able to demonstrate Tuxracer and glxgears running alongside a game on the guest OS, or is that a technical limitation meaning it's not possible run it in 1080p fullscreen?

No, it was just to demonstrate in the video. I use Looking Glass fullscreen at 2560x1440 most of the time (that's what my monitor tops out at). Just press CapsLock+F to go fullscreen when you're running a LibVF.IO VM.

@rogerskeie Try using lspci -vnn to see what kernel modules you have available for your Nvidia GPU. For me, I had nvidia, nvidia_drm, nvidia_vgpu_vfio, and nvidiafb. Adding those under MODULES=() in mkinitcpio.conf, regenerating my kernel image, and rebooting seems to have fixed the there is no device "hostdev0" defined error for me.

I just tried that in a fresh Manjaro install I had installed kernel 5.10 on, setting MODULES=(nvidia nvidia_drm nvidia_vgpu_vfio) in /etc/mkinitcpio.conf and rebuilding it with mkinitcpio -p linux510 and then rebooting, but it didn't help. It still says the loaded kernel driver for my Nvidia card is "nvidia", and I still get the "there is no device "hostdev0" defined" error.

I even tried just putting only nvidia_vgpu_vfio in the MODULES array, but it made no difference. I also tried to adding nvidia and nvidia_drm to /etc/modprobe.d/blacklist.conf and doing mkinitcpio -p linux510 again, hoping that would force it to load the nvidia_vgpu_vfio driver instead of the nvidia driver, but it still said that the loaded kernel driver was nvidia after reboot.

The same when I tried it in Ubuntu. I tried adding those modules to /etc/initramfs-tools/modules and doing sudo update-initramfs -u -k all and rebooting, but lspci -vnn still says that the loaded kernel driver is nvidia, and I get the same error when trying to set up the VM.

By the way, "nvidiafb" is not on the list of modules when I do 'lspci -vnn' in Manjaro, but it is in Ubuntu.

@rogerskeie We have a patch file in patches called twelve.patch which supports kernel 5.12. Twelve.patch author is here: https://github.com/rupansh/vgpu_unlock_5.12/blob/master/twelve.patch We apply that automatically if the script thinks you're newer than 5.11. One guy on our team is just focusing full time on getting 5.14 working as there have been more breaking changes since 5.12 and by popular demand (Fedora 35) we're looking into support for it's shipped kernel (5.14 I believe).

I saw in the video where someone was demonstrating libvf.io in action they were playing in a window instead of a full screen. Was that just to be able to demonstrate Tuxracer and glxgears running alongside a game on the guest OS, or is that a technical limitation meaning it's not possible run it in 1080p fullscreen?

No, it was just to demonstrate in the video. I use Looking Glass fullscreen at 2560x1440 most of the time (that's what my monitor tops out at). Just press CapsLock+F to go fullscreen when you're running a LibVF.IO VM.

That's good to hear :) At the moment I'm very tempted to make another serious attempt at switching to Linux again (with Manjaro being my favorite of the ones I've tried so far). I think it would certainly be beneficial for my workflow as a web developer, instead of having to deal with WSL. I just want to make sure I can get Windows properly working in a VM under it for if/when I need it.

@arthurrasmusson Have you had a chance to have a look at the debug log I posted yet?

@rogerskeie

I also tried to adding nvidia and nvidia_drm to /etc/modprobe.d/blacklist.conf and doing mkinitcpio -p linux510 again, hoping that would force it to load the nvidia_vgpu_vfio driver instead of the nvidia driver, but it still said that the loaded kernel driver was nvidia after reboot.

I tried adding those modules to /etc/initramfs-tools/modules and doing sudo update-initramfs -u -k all and rebooting, but lspci -vnn still says that the loaded kernel driver is nvidia, and I get the same error when trying to set up the VM.

I believe nvidia is the correct driver, at least that's what shows for me and the VM is working.
I also noticed I would get the hostdev0 error if there was something in my nvidia-mdev.yaml it didn't like or if the windows VM didn't shut down properly. For example, when I set max vram to 8000 and min to 7000, it would throw the hostdev0 error but dropping it down to 7000/6000 respectively seemed to boot.

By the way, "nvidiafb" is not on the list of modules when I do 'lspci -vnn' in Manjaro, but it is in Ubuntu.

This appears to be related to the kernel (linux-xanmod-lts) I had somehow. When I tried again with linux-lts, I don't have nvidiafb either and I'm using arch.

I ended up getting the VM booted and running through looking glass w/ nvidia drivers but unfortunately, I am not getting native(ish) performance. It doesn't seem to be the VM itself but looking glass, it lags even when my frame counter in game is a solid 60FPS. When downloading a game on steam, looking glass kept starting and stopping framethread capture on loop. Download stopped, looking glass returned. The other 2 things I wanted to figure out were if enabling high refresh rate is possible and if you can pass-through a storage device for closer to native I/O performance but I'd like to get looking glass working properly first.

At the moment I'm very tempted to make another serious attempt at switching to Linux again (with Manjaro being my favorite of the ones I've tried so far).

Glad to hear, I've been doing the same. Currently dual-booting but spend most of my time in Arch, unless there is a game that Proton cannot handle (which I hope to mostly solve with a VM, of course). Manjaro is a great pick too, the AUR is truly a blessing.

@Derple343 I dropped the VRam to 3000/2000, and it worked, thank you! :)

I guess it didn't like that I was basically trying to give the VM all 4GB of available dedicated VRam and leaving nothing for the host, which the default settings of 4000/3000 in nvidia-mdev.yaml happened to do for my particular GPU.

As this issue appears to be fixed I'm going to close it. If I missed anything ping me and I'll reopen it. :)