vgpu_unlock not working with 1050Ti as 2 x Tesla P40
nworbneb opened this issue · 8 comments
Many thanks for developing this script. I'm trying to use a 4Gb 1050Ti and passthrough as 2 x 2Gb Tesla P40's. I'm using Proxmox 7.0-11. I've previously passed through the whole card to a Windows 10 VM successfully.
NVIDIA drivers seem to be installed OK:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.04 Driver Version: 460.32.04 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... On | 00000000:04:00.0 Off | N/A |
| 0% 36C P8 N/A / 72W | 2046MiB / 4095MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I've defined two vGPU:
ed49b5c8-8a6c-4621-98e5-fac5204b9c99 0000:04:00.0 nvidia-47 (defined)
a4cdce3c-5631-47bf-b9c0-89fa14f02fe3 0000:04:00.0 nvidia-47 (defined)
Proxmox QEMU VM is configured:
agent: 1
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/ed49b5c8-8a6c-4621-98e5-fac5204b9c99,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1b38,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11a0' -uuid ed49b5c8-8a6c-4621-98e5-fac5204b9c99
bios: ovmf
boot: order=virtio0;net0;ide0;ide2
cores: 8
cpu: host
ide2: local:iso/virtio-win-0.1.196.iso,media=cdrom,size=486642K
machine: pc-q35-6.0
memory: 16384
name: retroemu.pve.xarin.com
net0: virtio=F6:08:58:72:87:9B,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=a091ae9f-ba27-4f3b-be09-84af7950be40
sockets: 1
vga: none
virtio0: local-zfs:base-150-disk-0/vm-153-disk-0,cache=writeback,discard=on,size=100G
vmgenid: ed49b5c8-8a6c-4621-98e5-fac5204b9c99
It looks like the NVIDIA driver is ok?
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389029]: vgpu_unlock loaded.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: vgpu_unlock loaded.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid ed49b5c8-8a6c-4621-98e5-fac5204b9c99 GPU PCI id 00:04:00.0 config params vgpu_type_id=47
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=47
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_env_log: Successfully updated env symbols!
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: op_type: 0x20801322 failed.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: op_type: 0x2080014b failed.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: op_type: 0xa0810115 failed.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): gpu-pci-id : 0x400
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): vgpu_type : Quadro
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): Framebuffer: 0x74000000
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1b38:0x11e9
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: ######## vGPU Manager Information: ########
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: Driver Version: 460.32.04
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: op_type: 0x2080012f failed.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): Cannot query ECC status. vGPU ECC support will be disabled.
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: (0x0): vGPU migration disabled
Oct 05 17:08:59 pve nvidia-vgpu-mgr[389041]: notice: vmiop_log: display_init inst: 0 successful
But when I start the VM I get an errror and in the VM there is no NVIDIA device visible.
root@pve:~# qm start 153
no efidisk configured! Using temporary efivars disk.
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/ed49b5c8-8a6c-4621-98e5-fac5204b9c99,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1b38,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11a0: warning: vfio ed49b5c8-8a6c-4621-98e5-fac5204b9c99: Could not enable error recovery for the device
Any suggestions would be appreciated.
For some reason your VM's UUID equals the vGPU's UUID. This should not happen and might be the cause of the issue. You should pass the UUID of the vGPU as the -device
argument and the UUID of the VM (system) to the -uuid
argument.
Example from https://docs.nvidia.com/grid/12.0/grid-vgpu-user-guide/index.html#adding-vgpu-to-red-hat-el-kvm-vm-qemu-cli
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/aa618089-8b16-4d01-a136-25a0f3c73123 \
-uuid ebb10a6e-7ac9-49aa-af92-f56bb8c65893
Thank you for your assistance. I tried using a different UUID (ebb10a6e-7ac9-49aa-af92-f56bb8c65893 from the example) and the UUID from the PVE vmgenid (ed49b5c8-8a6c-4621-98e5-fac5204b9c99) but neither made any difference.
I still get the "Could not enable error recovery for the device" error when starting the VM and I don't have any NVIDIA devices in the device manager in Win10. I can only connect to the VM with RDP, Parsec shows a black screen (to be expected with no GPU)
I tried removing the UUID argument entirely and i get:
root@pve:~# qm start 153
no efidisk configured! Using temporary efivars disk.
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/ed49b5c8-8a6c-4621-98e5-fac5204b9c99,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1b38,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11a0: vfio ed49b5c8-8a6c-4621-98e5-fac5204b9c99: error getting device from group 61: Connection timed out
Verify all devices in group 61 are bound to vfio-<bus> or pci-stub and not already in use
start failed: QEMU exited with code 1
Group 61 is the device assigned to the vGPU:
/sys/kernel/iommu_groups/60/devices/a4cdce3c-5631-47bf-b9c0-89fa14f02fe3
/sys/kernel/iommu_groups/61/devices/ed49b5c8-8a6c-4621-98e5-fac5204b9c99
I also tried removing the PCI spoofing:
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/ed49b5c8-8a6c-4621-98e5-fac5204b9c99' -uuid 00000000-0000-0000-0000-000000000153
but same status. VM boots into Win10 but does not show any NVIDIA hardware in device manager.
I am also having the same issue. And my config and observation below.
1 GPU nvidia 1080 for passthrough (to a single VM)
1 GPU nvidia 1050 Ti 4GB and passing through as 2 x P40 (P6000 driver being used)
I can see the device in windows guest, and parsec can ocassionally see one or two frames and hang.
If I uses RDP to connect, the GUI will freeze every few seconds and system is not responsive.
I see no vgpu unlock message in dmesg
Oh yes, make sure you use the 5.12 patches.
It seems to work fine on my system. Make sure it is using old version supported drivers. Not the most updated one. For me i am using 460
try to use proxmox 6.4.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 On | 00000000:01:00.0 Off | N/A |
| 0% 46C P8 N/A / 70W | 2039MiB / 2045MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 6661 C+G vgpu 1014MiB |
| 0 N/A N/A 6735 C+G vgpu 1014MiB |
+-----------------------------------------------------------------------------+
gtx1050 is ok
agent: 1
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/d1bdac94-7044-11ec-90d6-0242ac120003,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0' -uuid d1bdac94-7044-11ec-90d6-0242ac120003
audio0: device=ich9-intel-hda,driver=none
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
ide2: none,media=cdrom
machine: pc-q35-5.2
memory: 12288
name: Windows2
net0: virtio=AC:BD:EF:9D:38:14,bridge=vmbr0
numa: 0
ostype: win10
scsi0: Nvme:9997/base-9997-disk-0.qcow2/102/vm-102-disk-0.qcow2,discard=on,size=256G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=0ecd7afc-f508-4958-9e4b-cfbedabf4f25
sockets: 1
vga: none
vmgenid: 76952324-df1a-4cd9-ae15-6fe51289c686
pls make sure two uuid is same one.
and add efi disk when you use ovmf.
hello, can you tell me which version of you use vgpu driver and you guest virtual machine(win10) driver version?
460.73.01 merged driver
Same 460 series driver for windows client.