ipaqmaster/vfio

Issues with mounting a partition as a drive

zorbathut opened this issue · 5 comments

I'm trying to use this script to run a QEMU VFIO Windows 10 installation. I've gotten this installation working within virt-manager, but I thought I'd try this out to solve some lingering issues. Unfortunately I've found more issues.

Here's the commandline I'm currently using (I mangled the drive serial number):

./main -image /dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1 -imageformat raw -m 16G -hyperv -pinvcpus 0,1,2,3,4,5,6,7

which is generating the following qemu flags:

-machine q35,accel=kvm,kernel_irqchip=on -enable-kvm -m 16384 -cpu host,kvm=on,topoext=on,hv-frequencies,hv-relaxed,hv-reset,hv-runtime,hv-spinlocks=0x1fff,hv-stimer,hv-synic,hv-time,hv-vapic,hv-vpindex -smp sockets=1,cores=4,threads=2 -name main,debug-threads=on -drive if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/ovmf/x64/OVMF_CODE.fd -serial mon:stdio -nodefaults -drive file=/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1,if=none,discard=on,id=drive1,format=raw -device virtio-blk-pci,drive=drive1,id=disk1,iothread=iothread1 -object iothread,id=iothread1 -display sdl -vga virtio

Note that this is mounting a partition of a disk as a full disk within the VM. I'm not sure how standard this is! Maybe that's causing the issue.

This ends up dropping me into an EFI boot shell, which isn't really desired. If I open up FS0 and try running the efi boot program manually, it actually sits there with a Windows boot throbber for an appropriate amount of time, but then hits a bluescreen where it tells me the drive is inaccessible. I assume this means the drive isn't being exposed in the way Windows expects.

Unfortunately I have no idea what Windows is expecting, at least in terms of qemu commandline flags. Here's (I believe) the relevant XML from my working virt-manager setup:

<disk type="block" device="disk">
  <driver name="qemu" type="raw"/>
  <source dev="/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1" startupPolicy="mandatory"/>
  <backingStore/>
  <target dev="sda" bus="sata"/>
  <boot order="1"/>
  <address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>

and here's what I believe are the relevant clauses from the commandline that virt-manager ends up running:

-blockdev{"driver":"host_device","filename":"/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}
-blockdev{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}
-device{"driver":"ide-hd","bus":"ide.0","drive":"libvirt-2-format","id":"sata0-0-0","bootindex":1}

but I'm not sure how to translate that into the commandline flags that this script is generating.

Any suggestions? I'm happy to help debug, note - I want your script for the easy hugepages and hostaudio setup, but right now I'm sorta stuck :)

Hey there,

This setup you're using is a little weird but in theory should be fine. I assume partition1 of that JS600 disk contains within it an entire extra/nested GUID partition table? consisting of more partitions inside (Including EFI) for Windows to boot from? That should work just fine when passed entirely as raw despite looking funny to the host topology. You may be able to read and confirm this using fdisk -l /dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1 and seeing more nested partitions listed.

Otherwise if the entire disk is intended for the guest you will have to pass the whole thing through without mentioning partitions.

<disk type="block" device="disk">
  <driver name="qemu" type="raw"/>
  <source dev="/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1" startupPolicy="mandatory"/>
  <backingStore/>
  <target dev="sda" bus="sata"/>
  <boot order="1"/>
  <address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>

Assuming this is supposed to be a raw disk, that target dev=sda could be an indicator that libvirt is aware it's about to use a full disk and not just a single partition of said disk.

This setup you're using is a little weird but in theory should be fine. I assume partition1 of that JS600 disk contains within it an entire extra/nested GUID partition table? consisting of more partitions inside (Including EFI) for Windows to boot from? That should work just fine when passed entirely as raw despite looking funny to the host topology.

Yup, exactly. In order to modify it on the host device I need to do something a little wild with a loopback device, but it seems to be working as intended.

Main drive:

   ~  sudo fdisk -l /dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD                                                                                         ✔ 
Disk /dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD: 1.86 TiB, 2048408248320 bytes, 4000797360 sectors
Disk model: JS600 2TB       
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: [redacted, but it's different than the other one]

Device                                                Start        End    Sectors  Size Type
/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1  2048 4000796671 4000794624  1.9T Linux filesystem

Partition-containing-its-own-nested-GPT-filesystem:

   ~  sudo fdisk -l /dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1                                                                                 1 ✘ 
[sudo] password for zorba: 
Disk /dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1: 1.86 TiB, 2048406847488 bytes, 4000794624 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: [redacted]

Device                                                           Start        End    Sectors  Size Type
/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1-part1       2048     206847     204800  100M EFI System
/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1-part2     206848     239615      32768   16M Microsoft reserved
/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1-part3     239616 3999704142 3999464527  1.9T Microsoft basic data
/dev/disk/by-id/ata-JS600_2TB_SSD2TBABCDABCDABCD-part1-part4 3999707136 4000792575    1085440  530M Windows recovery environment

And yes, I am slightly entertained at the existence of ata-JS600_2TB_SSD2TBABCDABCDABCD-part1-part3 :)

I'm pretty sure the drive is showing up as a valid drive in some sense, because I was able to find the right EFI bootloader and run it, and it did start booting Windows. It just halted partway through to give me the bluescreen. If it can get far enough to even load the concept of a bluescreen then clearly something's going kinda right.

I still think the problem is that it is somehow showing up as the wrong kind of drive and/or with the wrong device - maybe Windows is expecting it as a qemu-driver device and that's not getting passed along properly with the script. But I could be incorrect on that.

--

A bit of an update: After posting the above issue, I found I couldn't get the VM working via virt-manager. I eventually tracked this down to Error 43 on the AMD GPU, and I concluded that maybe the script had done something messy when detaching from that device (I did try some early attempts with the device attached VFIO-wise). Rebooting solved it, it's now working fine again with virt-manager.

But I'm overall confused here because I was switching back and forth pretty easily before, so what broke? When did it break? I have no idea.

I don't think this has anything to do with the boot error, however, because (1) I wasn't even using that option during the later tests so the GPU shouldn't have been relevant, and (2) "EFI error about filesystem while booting" kinda trumps a GPU problem. I would just feel like a jerk if I didn't mention it and it turned out to be relevant.

Coming back to this, I made a 10GB zvol to test this with: zfs create zpool/nestparttest -V10G

Then:

  1. Made part1 in gdisk taking up the entire 10GB with the default "Linux Filesystem" partition type.
  2. Ran gdisk again but this time against the single 10GB partition and created a 50MB EFI partition and a second Linux partition taking up the remaining space.
  3. I started a VM with not just the disk, but explicitly partition 1 to give it the nested partition: $ ~/vfio/main -image /dev/zvol/zpool/nestparttest-part1 -imageformat raw -iso /data/ISOs/archlinux-2023.02.01-x86_64.iso -run

I ran fdisk -l inside the guest and it does successfully see the two nested partitions despite being passed the single host partition containing the two nested ones.

Next I wiped the nested partitions and ran archinstall for a basic ext4 installation and rebooted the VM - it had no trouble at all booting into this nested partition environment.


With this result I believe the problem you're experiencing isn't related to your disk partition nesting but something else.

You mentioned your Win VM there is bluescreening. Could you provide the bluesceen details?

Otherwise you may wish to try the --novirtio flag in case the Windows bootloader doesn't have the driver present for it and is getting stuck trying to access C:.

So I'm afraid I may not be able to help with this further; I've moved away from the drive-in-partition model so I can boot directly off my "VM", and may be moving away from a VM model entirely because it's proving to be kind of a hassle.

You're welcome to close this issue given that I can't really provide more help - if someone else shows up with the same problem later, maybe this will turn out to be useful in retrospect, but it's probably dead-ended for now unfortunately.