/xhyve

xhyve, a lightweight OS X virtualization solution

Primary LanguageCOtherNOASSERTION

About

The xhyve hypervisor is a port of bhyve to macOS. It is built on top of Hypervisor.framework in OS X 10.10 Yosemite and higher, runs entirely in userspace, and has no other dependencies. It can run FreeBSD, some Linux distributions, and Windows 10 and may gain support for other guest operating systems in the future.

License: BSD-2-Clause

Introduction: http://www.pagetable.com/?p=831

Requirements

  • OS X 10.10.3 Yosemite or later
  • a 2010 or later Mac (i.e. a CPU that supports EPT: sysctl kern.hv_support = 1)

Installation

If you have homebrew, then simply:

$ brew update
$ brew install --HEAD xhyve

The --HEAD in the brew command ensures that you always get the latest changes, even if the homebrew database is not yet updated. If for any reason you don't want that simply do brew install xhyve .

If you have MacPorts, then simply:

$ sudo port selfupdate
$ sudo port install xhyve

MacPorts is up to date with the GitHub ref listed in the port info

$ port info xhyve
xhyve @20170117 (emulators)
...

Otherwise:

Building

$ git clone https://github.com/machyve/xhyve.git
$ cd xhyve
$ xcodebuild

The resulting binary will be in build/Release/xhyve

Usage

$ xhyve -h

See below for steps to boot various OSs

What is bhyve?

bhyve is the FreeBSD hypervisor, roughly analogous to KVM + QEMU on Linux. It has a focus on simplicity.

It exposes the following peripherals to virtual machines:

  • Local x(2)APIC
  • IO-APIC
  • 8259A PIC
  • 8253/8254 PIT
  • HPET
  • PM Timer
  • RTC
  • PS/2 Keyboard and Mouse (via VNC)
  • PCI
    • host bridge
    • passthrough
    • UART
    • AHCI (i.e. HDD and CD)
    • VirtIO block device
    • VirtIO networking
    • VirtIO RNG
    • Intel e1000 (aka e82545)
    • VGA/Framebuffer (exposed with a minimal VNC server)
    • XHCI USB support with one device defined - a tablet for Windows guest support

bhyve architecture

                                                       Linux
           I/O        VM control       FreeBSD        NetBSD
                                                      OpenBSD
         |     A        |     A           |              |
         V     |        V     |           V              V
     +-------------++-------------++-------------++-------------+
     |             ||             ||             ||             |
     |    bhyve    ||  bhyvectl   ||  bhyveload  || grub2-bhyve |
     |             ||             ||             ||             |
     |             ||             ||             ||             |
     +-------------++-------------++-------------++-------------+
     +----------------------------------------------------------+
     |                        libvmmapi                         |
     +----------------------------------------------------------+
                                   A
                                   |                         user
     ------------------------------┼------------------------------
                                   | ioctl         FreeBSD kernel
                                   V
                     +----------------------------+
                     |        VMX/SVM host        |
                     |       VMX/SVM guest        |
                     |   VMX/SVM nested paging    |
                     |           Timers           |
                     |         Interrupts         |
                     +----------------------------+
                      vmm.ko

vmm.ko

The bhyve FreeBSD kernel module. Manages VM and vCPU objects, the guest physical address space and handles guest interaction with PIC, PIT, HPET, PM Timer, x(2)APIC and I/O-APIC. Contains a minimal x86 emulator to decode guest MMIO. Executes the two innermost vCPU runloops (VMX/SVM and interrupts/timers/paging). Has backends for Intel VMX and AMD SVM. Provides an ioctl and mmap API to userspace.

libvmmapi

Thin abstraction layer between the vmm.ko ioctl interface and the userspace C API.

bhyve

The userspace bhyve component (kind of a very light-weight QEMU) that executes virtual machines. Runs the guest I/O vCPU runloops. Manages ACPI, PCI and all non in-kernel devices. Interacts with vmm.ko through libvmmapi.

bhyvectl

Somewhat superfluous utility to introspect and manage the life cycle of virtual machines. Virtual machines and vCPUs can exist as kernel objects independently of a bhyve host process. Typically used to delete VM objects after use. Odd architectural choice.

bhyveload

Userspace port of the FreeBSD bootloader. This is a cumbersome workaround to bootstrap a FreeBSD guest operating system without using firmware. It creates a VM object, loads the FreeBSD kernel into guest memory, sets up the initial vCPU state and then exits. Only then a VM can be executed by bhyve.

grub2-bhyve

Performs the same function as bhyveload but is a userspace port of GRUB2. It is used to bootstrap guest operating systems other than FreeBSD, i.e. Linux, OpenBSD and NetBSD.

xhyve architecture

    +----------------------------------------------------------+
    | xhyve                                                    |
    |                                                          |
    |                            I/O                           |
    |                                                          |
    |                                                          |
    |                                                          |
    |+--------------------------------------------------------+|
    ||  vmm                   VMX guest                       ||
    ||                          Timers                        ||
    ||                        Interrupts                      ||
    |+--------------------------------------------------------+|
    +----------------------------------------------------------+
    +----------------------------------------------------------+
    |                   Hypervisor.framework                   |
    +----------------------------------------------------------+
                                  A
                                  |                         user
    ------------------------------┼------------------------------
                                  |syscall            xnu kernel
                                  V
    
                               VMX host
                           VMX nested paging

xhyve shares most of the code with bhyve but is architecturally very different. Hypervisor.framework provides an interface to the VMX VMCS guest state and a safe subset of the VMCS control fields, thus making userspace hypervisors without any additional kernel extensions possible. The VMX host state and all aspects of nested paging are handled by the macOS kernel, you can manage the guest physical address space simply through mapping of regions of your own address space.

xhyve is equivalent to the bhyve process but gains a subset of a userspace port of the vmm kernel module. SVM, PCI passthrough and the VMX host and EPT aspects are dropped. The vmm component provides a libvmmapi compatible interface to xhyve. Hypervisor.framework seems to enforce a strict 1:1 relationship between a host process/VM and host thread/vCPU, that means VMs and vCPUs can only be interacted with by the processes and threads that created them. Therefore, unlike bhyve, xhyve needs to adhere to a single process model. Multiple virtual machines can be created by launching multiple instances of xhyve. xhyve retains most of the bhyve command line interface.

bhyvectl, bhyveload and grub2-bhyve are incompatible with a single process model and are dropped. xhyve supports the Linux kexec protocol, a very simple and straightforward way to bootstrap a Linux kernel. It takes a bzImage and optionally initrd image and kernel parameter string as input.

xhyve can now boot an OS via EFI. The BSD-licensed TianoCore EFI built for bhyve can be used to boot Windows and other OSs.

Networking

If you want the same IP address across VM reboots, assign a UUID to a particular VM:

$ xhyve [-U uuid]

Optional:

If you need more advanced networking and already have a configured TAP device you can use it with:

virtio-tap,tapX

instead of:

virtio-net

Where X is your tap device, i.e. /dev/tapX.

Booting TinyCoreLinux

Everything needed to boot TinyCoreLinux is included with xhyve.

Steps:

  • From Terminal, launch the xhyverun-tinycorelinux.sh script in your xhyve directory.

Booting FreeBSD (via userboot)

Requirements:

  • A FreeBSD iso image. This can be downloaded from FreeBSD.org

Steps:

  • Build xhyve with make (type make in Terminal from your xhyve directory) - this will build an unsigned copy of xhyve
  • mkfile 5g FreeBSD.dmg - Create a blank image to install to
  • Use your favorite text editor to edit the xhyverun-freebsd.sh script and properly set the paths to the iso and disk image
  • Run the script from Terminal with sudo: sudo ./xhyverun-freebsd.sh and enter your admin password

Known Issues:

  • This will only work with an unsigned build of xhyve - See Codesigning/Entitlements in Issues

Booting Windows (via EFI)

Now that xhyve has support for a framebuffer, EFI and the e1000 NIC, xhyve can now run Windows in a VM.

Requirements:

  • A Windows 10 iso image. This can be downloaded from Microsoft: Windows 10 iso
  • A license key for Windows.
  • The bhyve EFI - this can be downloaded in FreeBSD via pkg install bhyve-firmware - the EFI will be named "BHYVE_UEFI.fd" and installed into /usr/local/share/uefi-firmware. Copy that file to macOS.
  • A VNC client - these vary greatly in speed and willingness to connect to the fairly minimal VNC server built into the framebuffer code.

Steps:

  • mkfile 20g Windows.dmg - Create a blank image to install to
  • Use your favorite text editor to edit the xhyverun-windows.sh script and properly set the paths to the iso, disk image and BHYVE_UEFI.fd
  • From Terminal, launch the modified xhyverun-windows.sh script in your xhyve directory.
  • Connect the VNC client to 127.0.0.1:29000

Known Issues:

  • Windows does not recognize more than one CPU and the CPU device is missing from Device Manager
  • The e1000 emulation works, but is incomplete - network statistics don't appear in Task Manager or Resource Monitor
  • Mouse positioning in VNC is wacky - this is due to the nature of how mouse deltas are passed to the VM and what Windows does to them afterwards. Once Windows is installed, one can enable remote connection and connect with Microsoft Remote Desktop instead of the VNC client.

Issues

Virtual Box

If you are, or were, running any version of VirtualBox, prior to 4.3.30 or 5.0, and attempt to run xhyve, your system will immediately crash as a kernel panic is triggered. This is due to a VirtualBox bug (that got fixed in newest VirtualBox versions) as VirtualBox wasn't playing nice with OSX's Hypervisor.framework used by xhyve.

To get around this you either have to update to newest VirtualBox 4.3 or 5.0 or, if you for some reason are unable to update, to reboot your Mac after using VirtualBox and before attempting to use xhyve. (see issues #5 and #9 for the full context)

Code signing/Entitlements

macOS limits access to the networking API (vmnet) to builds that are code signed and have the appropriate entitlement. The code signing/entitlement requirement can be bypassed by running xhyve as root (via sudo). A code signed build cannot run FreeBSD via the userboot.so bootloader as that requires loading and executing code that is outside the code signature (even as root). Building xhyve via xcodebuild signs the build. Building xhyve via make does not.

TODO

  • vmm:
    • enable APIC access page to speed up APIC emulation (performance)
    • enable x2APIC MSRs (even faster) (performance)
    • vmm_callout:
      • is a quick'n'dirty implementation of the FreeBSD kernel callout mechanism
      • seems to be racy
      • fix races or perhaps replace with something better
      • use per vCPU timer event thread (performance)?
      • use hardware VMX preemption timer instead of pthread_cond_wait (performance)
    • some 32-bit guests are broken (support PAE paging in VMCS)
    • PCID guest support (performance)
  • block_if:
    • macOS does not support preadv/pwritev, we need to serialize reads and writes for the time being until we find a better solution. (performance)
    • support block devices other than plain files
  • virtio_net:
    • unify TAP and vmnet backends
    • vmnet: send/receive more than a single packet at a time (performance)
  • virtio_rnd:
    • is untested
  • e1000
    • is untested beyond basically working with Windows
    • fix missing statistics (see Booting Windows)
    • add support for TAP
    • tune performance (performance)
  • framebuffer
    • is untested beyond basically working with Windows
    • VNC: add support for more modern connections (ssh) to improve security and compatibility with more clients
    • add an option to share the framebuffer with a client UI app (specifics need defining) (performance)
  • UEFI
    • is untested beyond basically working with Windows
    • Needs testing against various OSs
    • figure out why Windows doesn't properly see more than one CPU (might also involve ACPI)
  • XHCI
    • Move XHCI support and the tablet device to xhyve
  • remove explicit state transitions:
    • since only the owning task/thread can modify the VM/vCPUs a lot of the synchronization might be unnecessary (performance)
  • performance, performance and performance
  • remove vestigial code, cleanup