/openshift-workstation

Single Node OpenShift Workstation with GPU passthrough and CNV

GNU General Public License v3.0GPL-3.0

OpenShift Workstation with Single GPU passthrough

Introduction

This arcticle discribes a way to run OpenShift as a workstation with GPU PCI passthrough and using Container Native Virtualization (CNV) in order to provide a virtualized Desktop experience running on a single OpenShift node.

This is used to run Microsoft FlightSimulator in a Windows VM with performances close from a Bare metal Windows installation.

In the future, a few improvements will be worked on: * Reducing the Control Plane footprint by relaying on microshift instead. * Using GPU from containers instead of virtual machines for Linux Desktop.

Hardware description

The workstation in use for this demo is a compact, mini-ITX workstation with the following hardware: * AMD Ryzen 9 3950X 16-Core 32-Threads * 64GB DDR4 * Nvidia RTX 3080 * 2x NVMe Disks * 1x SSD Disk * Single NIC 1GB

Backup of existing system partitions.

In order to avoid boot order conflicts, the OpenShift assisted installer will format the 512 first bytes of any disks that contains a bootable partition.

Therefore, in order to avoid loosing any existing system it is important to backup and remove any existing EFI/BIOS and boot partitions from already existing disks.

Installing OpenShift SNO

Once any existing file system is backed up and there is no more bootable partitions we can proceed with the OpenShift Single Node install.

It is important to note that CoreOS, the underlying operating system requires an entire disk for installation.

Here, we will keep the two NVMe disks for the persistant volumes as LVM Physical volumes belonging to a same Volume Group and we will use the SSD disk for the OpenShift operating system.

link:install/get-ocp-binaries.sh[role=include]
install-config.yaml
link:install/install-config.yaml[role=include]
Generate OpenShift Container Platform assets:
mkdir ocp && cp install-config.yaml ocp
openshift-install --dir=ocp create single-node-ignition-config
Embed the ignition data into the RHCOS ISO:
alias coreos-installer='podman run --privileged --rm \
      -v /dev:/dev -v /run/udev:/run/udev -v $PWD:/data \
      -w /data quay.io/coreos/coreos-installer:release'
cp ocp/bootstrap-in-place-for-live-iso.ign iso.ign
coreos-installer iso ignition embed -fi iso.ign rhcos-live.x86_64.iso
dd if=discovery_image_sno.iso of=/dev/usbkey status=progress

Once the ISO is copied to the USB drive, you can use the USB drive to boot your workstation node and install OpenShift Container Platform.

Install CNV Operator

Activate Intel VT or AMD-V hardware virtualization extensions in BIOS.

cnv-resources.yaml
link:install/cnv-resources.yaml[role=include]
oc apply -f cnv-resources.yaml

Installing the Virtctl client on your desktop.

subscription-manager repos --enable cnv-4.10-for-rhel-8-x86_64-rpms
dnf install kubevirt-virtctl

Remove Local Storage operator (if installed)

As we do not need to manage LVM volumes automatically we would like to avoid automatically formating Logical Volumes once they are deleted from OpenShift.

While this could lead to data leak in a multi-tenant environment, removing the Local Storage Operator also avoid loosing your Virtual Machine partitions once you delete it.

Configure OpenShift for single GPU passthrough

As our GPU is the only one attached to the node a few additional steps are required.

We will use MachineConfig to configure our node accordingly.

All MachineConfig are applied on the master machineset because we have a single node OpenShift. With a multi nodes cluster those would be applied to worker instead.

Adding IOMMU and VGA off Kernel arguments

To prevent Host system to bind console to GPU

100-sno-kernelargs.yaml
link:machineconfig/100-sno-kernelargs.yaml[role=include]
Note
If you’re using an Intel CPU you’ll have to set intel_iommu=on instead.

We deactivate the EFI Frambuffer with efifb:off

Binding GPU to VFIO Driver at boot time

We first gather the PCI Vendor and product IDs from pciutils.

lspci -nn |grep VGA
100-sno-vfiopci.bu
link:machineconfig/100-sno-vfiopci.bu[role=include]
dnf install butane
butane 100-sno-vfiopci.bu -o 100-sno-vfiopci.yaml
oc apply -f 100-sno-vfiopci.yaml

Unbinding VTConsole at boot time

98-sno-vtconsole-unbind.yaml
link:machineconfig/98-sno-vtconsole-unbind.yaml[role=include]

Add GPU as Hardware Device of your node

oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  permittedHostDevices:
    pciHostDevices:
    - pciDeviceSelector: "10DE:2206"
      resourceName: "nvidia.com/GEFORCE_RTX_3080"
    - pciDeviceSelector: "10DE:1AEF"
      resourceName: "nvidia.com/GEFORCE_RTX_3080_AUDIO"
...

Passthrough one of the USB Host Controller to the VM

In order to directly connect a mouse, keyboard, audio device etc directly to the VM we passthrough one if the USB controller directly to the VM.

Identify a USB Controller and its IOMMU group

We first need to indentify it using pciutils.

lspci -nnk

After selecting the USB Controller we want to dedicate to the Virtual Machine we should verify that this is the only PCI device in its IOMMU group. We first look for the PCI address in the iommu_groups folder structure, the list the PCI addresses belonging to this IOMMU group.

find /sys/kernel/iommu_groups/ -iname "*0b:00.3*"
ls /sys/kernel/iommu_groups/27/devices/

Add the USB Controller as Hardware Device of your node

Once identified we add its Vendor and product IDs to the list of permitted Host Devices.

Currently, Kubevirt does not allow providing a specific PCI address, therefore the pciDeviceSelector will match all similar USB Host Controller from the node. However, as we will only bind the one we are interested in to the VFIO-PCI driver the other ones will not be available for pci passthrough.

oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  permittedHostDevices:
    pciHostDevices:
    - pciDeviceSelector: "1222:164F"
      resourceName: "amd.com/XHCI_USB3_Controller"
...

Binding the USB Controller to VFIO-PCI driver at boot time

98-sno-xhci-unbind.yaml
link:machineconfig/98-sno-xhci-unbind.yaml[role=include]
Caution
double backslash with systemd in pci device id path https://www.freedesktop.org/software/systemd/man/systemd.syntax.html

Creating Virtual Machine

The virtual machine will use existing LVM Logical volumes, here we will assume we already have the Operating System installed on the LV with a UEFI boot.

Create PV and PV Claim out of local LVM disks

fedora35.yaml
link:pv/fedora35.yaml[role=include]

Defining the Virtual Machine

The virtual machines we will use as Desktops comes with a few specities: * We will passthrough the entire GPU Ref: https://kubevirt.io/2021/intel-vgpu-kubevirt.html * We will remove the existing default virtual VGA ref: https://kubevirt.io/api-reference/master/definitions.html#_v1_devices * We will passthrough en entire USB controller * We will use UEFI boot to be closer from typical BareMetal Ref: https://docs.openshift.com/container-platform/4.10/virt/virtual_machines/advanced_vm_management/virt-efi-mode-for-vms.html

fedora35.yaml
link:vms/fedora35.yaml[role=include]

Next

This chapter is used as reference to the furture improvements to make.