cockpit-project/cockpit-machines

VM: Improve Windows VM performance

garrett opened this issue · 4 comments

The Windows VM had a severe performance hit, which required editing the VMs .XML file directly to disable several scheduling systems and pinning specific CPU cores as per this guide, something that I feel should be partly the default configuration for windows guests, and partly exposed in the settings page for the VM

Originally posted by @Betonhaus in #680 (comment)

Here's a copy and paste of the comment at that link (just in case it goes away):

I run a Windows 11 virtual machine with good performance on my Chromebook with a 10th gen i5 and 16GB of RAM (similar specs as your laptop).

Looking at your libvirt XML, there are a few optimizations you can make:

Apply all available Hyper-V enlightenments - the section of your XML should look like this:

<hyperv>
  <relaxed state='on'/>
  <vapic state='on'/>
  <spinlocks state='on' retries='8191'/>
  <vpindex state='on'/>
  <synic state='on'/>
  <stimer state='on'>
    <direct state='on'/>
  </stimer>
  <reset state='on'/>
  <frequencies state='on'/>
  <reenlightenment state='on'/>
  <tlbflush state='on'/>
  <ipi state='on'/>
</hyperv>

Disable all timers except for the hypervclock - the section of your XML should look like this:

<clock offset='localtime'>
  <timer name='rtc' present='no' tickpolicy='catchup'/>
  <timer name='pit' present='no' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
  <timer name='kvmclock' present='no'/>
  <timer name='hypervclock' present='yes'/>
</clock>

Those two improvements alone should result in a massive speedup.

Further improvements can be made, though. I recommend using CPU pinning - this forces each virtual CPU to be pinned to a physical CPU core (or in this case, virtual Crostini core), reducing the performance overhead from the kernel constantly swapping the virtual CPUs to different threads. For example, I do the following (6 cores for the VM on an 8-core host):

<vcpu placement='static'>6</vcpu>
<iothreads>1</iothreads>
<cputune>
  <vcpupin vcpu='0' cpuset='1'/>
  <vcpupin vcpu='1' cpuset='5'/>
  <vcpupin vcpu='2' cpuset='2'/>
  <vcpupin vcpu='3' cpuset='6'/>
  <vcpupin vcpu='4' cpuset='3'/>
  <vcpupin vcpu='5' cpuset='7'/>
  <emulatorpin cpuset='0,4'/>
  <iothreadpin iothread='1' cpuset='0,4'/>
</cputune>

I highly recommend using virtio as your disk type, as this allows disk access to be paravirtualized, further reducing overhead. This requires driver support though on the Windows side - easiest way to enable this is to reinstall Windows, and when partitioning the disk, insert the virtio-win drivers ISO into your virtual machine so that the disk can be recognized during setup. I'm using this in my XML:

<disk type='file' device='disk'>
  <driver name='qemu' type='qcow2' cache='none' io='threads' discard='unmap' iothread='1' queues='6'/>
  <source file='/var/lib/libvirt/images/win11.qcow2'/>
  <target dev='vda' bus='virtio'/>
  <boot order='2'/>
  <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</disk>

Finally, make sure you install the Spice guest tools to improve how the VM handles mouse input between the guest and host, and to automatically change the VM's resolution when the window resizes.

A follow-up reply to that mentions you can improve the disk too without restarting:

Thanks for this, applied it all and works like a dream.

Just on the virtio for disk, I've managed to switch my existing disk over, without a reinstall. I followed this:

https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Virtio_disk

You add a new temp disk on virtio, start Windows, install the drivers, then shut down again, I then removed the temp disk, and the SATA disk, and readd it as virtio. Worked for me without any further tweaks.

The CPU related optimizations feel more as tweaking to me then something which cockpit-machines should do. I wonder why virt-install doesn't do the right thing? (But note that CPU pinning is highly machine specific so we can't generalize that part at all)

You can already select virtio in the machines as bus type. So that would not require any new changes.

The most critical Win10/2016+ related domain XML tuning should be the following:

# add under <hyperv>
<synic state='on'/>
<stimer state='on'/>
<vpindex state='on'/>

# add under <clock>
<timer name='hypervclock' present='yes'/>

I would avoid more invasive changes, as disabling all other timer sources under <clock> and especially CPU pinning (which can be exposed to the user, but should not be automatically set).