ionescu007/SimpleVisor

SimpleVisor: Booting OS

Closed this issue · 23 comments

Hello,
i read that Simplevisor doesn't support the booting of Operating Systems right now. Could you explain what problems could occur?
Right now i am experimenting with SimpleVisor. I am loading the hypervisor (as Unrestricted Guest) in UEFI. But as soon as the kernel is running it crashes.

EDIT: it might be a memory problem, because it crashes randomly. Sometimes right at the start or at login screen.

Hi,

It's basically untested, and guaranteed not to work with multiple processors. But even with one processor, because Windows will be configuring the APIC, potentially sending the SIPI signal, etc, all these are things that SimpleVisor does not currently handle.

@ionescu007 we are adding support for this right now in Bareflank. We have UEFI working, and we have multi-core working, the only part that is missing is the SIPI/INIT process which is well documented here

Here is the patch to Bareflank to get UEFI working:
https://github.com/Bareflank/bfdriver/pull/3

It's not complete yet as it doesn't have the SIPI/INIT emulation that is needed to start Windows or Linux, but the hypervisor starts up on all cores without issue. It was a couple of months ago when we wrote this, but IIRC, the two main issues were making sure that VMX was enabled (which requires trapping on mods to CR4), and making sure the TSS was setup on each core since UEFI doesn't do that for you. Once all of that was handled properly, it worked fine.

Note that this patch only have the driver mods, the mods to the hypervisor to trap on CR4 are not included

Let me know how it goes. We will be upstreaming our patches to Bareflank over the next several weeks. Once it is completely done, I will let you know so that if your still having issues, you can check out all of our changes

I wanted to verify my result before responding, and I got around to doing so and tested stopping the hypervisor as well to ensure proper operation. It seems to be working without error even after the firmware suspends cores the hypervisor is resident on. (Tested on QEMU/OVMF)

The crux of the problem was that EFI's SwitchBSP function (used here to be compatible with Bareflank's common api) would swap cr4 with the core the BSP was moving to. As the VMXE bit was not set on the core BSP was moving to, the bit would become unset on the previous (now hypervisor-resident) core during the swap. This would cause an unhandled general protection fault.

Not sure if the problem you're encountering is the same, but in any case I hope this helps.

Edit: on second look, the method I used in Driver.c isn't doing the trick (derp - I had to wrap this up too quickly) and instead a modification I made to OVMF while debugging would simply turn on VMXE when swapping. The problem should be the same in nature though.

On second look, the method I used in Driver.c isn't doing the trick (derp - I had to wrap this up too quickly) and instead a modification I made to OVMF while debugging would simply turn on VMXE when swapping. The problem should be the same in nature though.

You can't use SwitchBSP to effectively change other core's VMXE, as I so naively attempted, because the setting just follows you as you change cores. You'd have to use a different MP function to do so, or perhaps the other method you describe. I do think the cleanest solution would be a CR4 exit handler, as that could handle whatever the system attempts to do to CR4 regardless (but does this have performance implications?).

I am testing with kvm in nested mode.

Just depends on how you want to handle CR4. We have not really seen an impact since CR4 doesn't get changed all that often. Also.... all hypervisors really should trap on mods to CR4 and at a minimum, mask off the VMXE bit so that the OS cannot accidentally turn it off.

Just my two cents. Bareflank itself doesn't handle CR4 yet, but it will to support UEFI, and to make sure that VMXE cannot be disabled. I know that we saw an issue with Linux with new kernels because they added a shadow of CR4 that was disabling VMXE from the shadow (Xen and KVM saw the same problem), but this could be fixed with the same CR4 exit handler to prevent that bit from being flipped

As a side note, we also had to add a shadow of the GDT when stopping the hypervisor because Linux now marks the GDT as read-only, which prevents flipping the TSS busy bit.

When stopping the hypervisor, you are "promoting" ring 0 to ring -1. Since our VMM has it's own GDT, there are two TSS's, one for Windows/Linux and one for the VMM. Both are marked busy because both are being used. The problem is you cannot load a TSS that is marked busy so we have to flip that bit to make it work.

Yeah, Bareflank has it's own Page Tables, IDT, GDT, control registers, etc... Version 1.0 was like MoRE, SimpleVisor, etc... that used the same resources for both the VMM and the Host OS, but with 1.1 we wanted to move to separate resources which simplified things like memory mapping, and allowed for an easier method for loading via userspace applications

Looks like I do setup a TSS for each processor already, so I think the VMXE might be the issue. However it seems the link you gave me no longer works. Where can I see the pull request now @rianquinn

@ainfosec-henselb Can you point @ionescu007 to what you have so far for UEFI?

Awesome, did that allow you to boot into Windows? Or does it just get SimpleVisor to run on all of the cores?

We have UEFI working completely with Windows and Linux. There are a couple of things that were needed, but in general, I think that SimpleVisor could support this easily with what is already supported. We also hyperjack EFI, and the resources that you are taking EFI reserves, so there should be no additional work safely boot Windows.

Check out my friend @tandasat's UEFI hypervisor which now fulfills this need:

https://github.com/tandasat/MiniVisorPkg