Bugcheck 3B while running nested in KVM
trapframe opened this issue · 7 comments
Hi,
Although DdiMon works perfectly well nested in VmWare it bugcheck systematically while running inside KVM.
Logs shows that hypervisor was succefully loaded and i confirmed that the hook functions are getting called as well.
It takes a couple of secondes to bugcheck between 5 and 20 depending of the VM activity.
i can repro on the same VM used in VmWare (Windows 7 SP1 x64 with 2 processors)
bugcheck shows :
EXCEPTION_CODE: (NTSTATUS) 0xc000001d - {EXCEPTION} Illegal Instruction An attempt was made to execute an illegal instruction.
FAULTING_IP:
+0
fffffa80`00c3f000 0f01c1 vmcall
Below is the extract of the log and the !analyze results.
I can provide the dump if needed.
13:44:57.014 DBG #1 4 2472 System Log thread started (TID= 00000000000009A8).
13:44:57.124 INF #0 4 44 System Log has been initialized.
13:44:57.124 DBG #0 4 44 System Info= FFFFF88004CE4410, Buffer= FFFFFA8000E2C000 FFFFFA8000E3C000, File= \SystemRoot\DdiMon.log
13:44:57.124 DBG #0 4 44 System PXE at fffff6fb7dbed000, PPE at fffff6fb7da00000, PDE at fffff6fb40000000, PTE at fffff68000000000
13:44:57.124 DBG #0 4 44 System Physical Memory Range: 0000000000001000 - 000000000009f000
13:44:57.124 DBG #0 4 44 System Physical Memory Range: 0000000000100000 - 000000003ffde000
13:44:57.124 DBG #0 4 44 System Physical Memory Total: 1048048 KB
13:44:57.139 DBG #0 4 44 System shared_data = FFFFFA8002684070
13:44:57.170 DBG #0 4 44 System MTRR Default=6, VariableCount=8, FixedSupported=1, FixedEnabled=1
13:44:57.170 INF #0 4 44 System Initializing VMX for the processor 0.
13:44:57.514 DBG #0 4 44 System vmm_stack_limit = FFFFFA800107A000
13:44:57.514 DBG #0 4 44 System vmm_stack_region_base = fffffa8001080000
13:44:57.514 DBG #0 4 44 System vmm_stack_data = fffffa800107fff8
13:44:57.530 DBG #0 4 44 System vmm_stack_base = fffffa800107fff0
13:44:57.530 DBG #0 4 44 System processor_data = FFFFFA80029F5210 stored at fffffa800107fff8
13:44:57.530 DBG #0 4 44 System guest_stack_pointer = fffff88002ffa660
13:44:57.530 DBG #0 4 44 System guest_inst_pointer = fffff88004c123a7
13:44:57.530 DBG #0 4 44 System IA32_VMX_CR0_FIXED0 = 80000021
13:44:57.530 DBG #0 4 44 System IA32_VMX_CR0_FIXED1 = ffffffff
13:44:57.530 DBG #0 4 44 System Original CR0 = 80050031
13:44:57.530 DBG #0 4 44 System Fixed CR0 = 80050031
13:44:57.530 DBG #0 4 44 System IA32_VMX_CR4_FIXED0 = 00002000
13:44:57.530 DBG #0 4 44 System IA32_VMX_CR4_FIXED1 = 003727ff
13:44:57.530 DBG #0 4 44 System Original CR4 = 000406f8
13:44:57.530 DBG #0 4 44 System Fixed CR4 = 000426f8
13:44:57.530 DBG #0 4 44 System VmEntryControls = 000013ff
13:44:57.530 DBG #0 4 44 System VmExitControls = 0003effb
13:44:57.530 DBG #0 4 44 System PinBasedControls = 00000016
13:44:57.530 DBG #0 4 44 System ProcessorBasedControls = 9680e172
13:44:57.545 DBG #0 4 44 System SecondaryProcessorBasedControls = 0010102e
13:44:57.545 DBG #0 4 44 System Context at FFFFF88004C123F8: rax= 0000000000000000 rbx= 0000000000000000 rcx= fffff88004cf2710 rdx= fffffa8002684070 rsi= fffffa8000dfe000 rdi= fffff88002ffa798 rsp= fffff88002ffa6e0 rbp= 0000000000000000 r8= 0000000000000065 r9= 0000000000000000 r10= 0000000000000000 r11= fffff88002ff9e60 r12= 0000000000000001 r13= ffffffff80000938 r14= fffffa80029bc780 r15= 000000000000001c efl= 00000282
13:44:57.545 INF #0 4 44 System Initialized successfully.
13:44:57.545 INF #1 4 44 System Initializing VMX for the processor 1.
13:44:57.889 DBG #1 4 44 System vmm_stack_limit = FFFFFA80012BB000
13:44:57.889 DBG #1 4 44 System vmm_stack_region_base = fffffa80012c1000
13:44:57.889 DBG #1 4 44 System vmm_stack_data = fffffa80012c0ff8
13:44:57.889 DBG #1 4 44 System vmm_stack_base = fffffa80012c0ff0
13:44:57.889 DBG #1 4 44 System processor_data = FFFFFA80029F9210 stored at fffffa80012c0ff8
13:44:57.889 DBG #1 4 44 System guest_stack_pointer = fffff88002ffa660
13:44:57.889 DBG #1 4 44 System guest_inst_pointer = fffff88004c123a7
13:44:57.889 DBG #1 4 44 System IA32_VMX_CR0_FIXED0 = 80000021
13:44:57.889 DBG #1 4 44 System IA32_VMX_CR0_FIXED1 = ffffffff
13:44:57.905 DBG #1 4 44 System Original CR0 = 80050031
13:44:57.905 DBG #1 4 44 System Fixed CR0 = 80050031
13:44:57.905 DBG #1 4 44 System IA32_VMX_CR4_FIXED0 = 00002000
13:44:57.905 DBG #1 4 44 System IA32_VMX_CR4_FIXED1 = 003727ff
13:44:57.905 DBG #1 4 44 System Original CR4 = 000406f8
13:44:57.905 DBG #1 4 44 System Fixed CR4 = 000426f8
13:44:57.905 DBG #1 4 44 System VmEntryControls = 000013ff
13:44:57.905 DBG #1 4 44 System VmExitControls = 0003effb
13:44:57.905 DBG #1 4 44 System PinBasedControls = 00000016
13:44:57.905 DBG #1 4 44 System ProcessorBasedControls = 9680e172
13:44:57.905 DBG #1 4 44 System SecondaryProcessorBasedControls = 0010102e
13:44:57.905 DBG #1 4 44 System Context at FFFFF88004C123F8: rax= 0000000000000000 rbx= 0000000000000000 rcx= fffff88004cf2710 rdx= fffffa8002684070 rsi= fffffa8000dfe000 rdi= fffff88002ffa798 rsp= fffff88002ffa6e0 rbp= 0000000000000000 r8= 0000000000000065 r9= 0000000000000000 r10= 0000000000000000 r11= fffff88002ff9e60 r12= 0000000000000001 r13= ffffffff80000938 r14= fffffa80029bc780 r15= 000000000000001c efl= 00000282
13:44:57.905 INF #1 4 44 System Initialized successfully.
13:44:57.905 DBG #1 4 44 System Patch = FFFFF800029BA0E0, Exec = FFFFFA80012C40E0, RW = FFFFFA80012C30E0, Trampoline = FFFFFA80029ED480
13:44:57.920 INF #1 4 44 System Hook has been installed at fffff800029ba0e0 ExAllocatePoolWithTag.
13:44:57.920 DBG #0 4 44 System Patch = FFFFF800029BA0D0, Exec = FFFFFA80012C40D0, RW = FFFFFA80012C30D0, Trampoline = FFFFFA8002803D00
13:44:57.920 INF #0 4 44 System Hook has been installed at fffff800029ba0d0 ExFreePool.
13:44:57.920 DBG #0 4 44 System Patch = FFFFF800029BAD90, Exec = FFFFFA80012C4D90, RW = FFFFFA80012C3D90, Trampoline = FFFFFA8001D72930
13:44:57.920 INF #0 4 44 System Hook has been installed at fffff800029bad90 ExFreePoolWithTag.
13:44:57.920 DBG #0 4 44 System Patch = FFFFF8000289BC10, Exec = FFFFFA80012C6C10, RW = FFFFFA80012C5C10, Trampoline = FFFFFA8002380EC0
13:44:57.920 INF #0 4 44 System Hook has been installed at fffff8000289bc10 ExQueueWorkItem.
13:44:57.936 DBG #0 4 44 System Patch = FFFFF80002B998FC, Exec = FFFFFA80012C88FC, RW = FFFFFA80012C78FC, Trampoline = FFFFFA8002A479E0
13:44:57.936 INF #0 4 44 System Hook has been installed at fffff80002b998fc NtQuerySystemInformation.
13:44:57.952 INF #1 4 44 System DdiMon has been initialized.
13:44:57.967 INF #0 4 44 System The VMM has been installed.
Bugcheck:
SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: c000001d, Exception code that caused the bugcheck
Arg2: 00c3f000, Address of the instruction which caused the bugcheck
Arg3: 04f0a750, Address of the context record for the exception that caused the bugcheck
Arg4: 00000000, zero.
Debugging Details:
------------------
DUMP_CLASS: 1
DUMP_QUALIFIER: 402
BUILD_VERSION_STRING: 7601.17514.amd64fre.win7sp1_rtm.101119-1850
SYSTEM_MANUFACTURER: QEMU
SYSTEM_PRODUCT_NAME: Standard PC (i440FX + PIIX, 1996)
SYSTEM_VERSION: pc-i440fx-bionic
BIOS_VENDOR: SeaBIOS
BIOS_VERSION: 1.10.2-1ubuntu1
BIOS_DATE: 04/01/2014
DUMP_TYPE: 0
BUGCHECK_P1: c000001d
BUGCHECK_P2: fffffa8000c3f000
BUGCHECK_P3: fffff88004f0a750
BUGCHECK_P4: 0
EXCEPTION_CODE: (NTSTATUS) 0xc000001d - {EXCEPTION} Illegal Instruction An attempt was made to execute an illegal instruction.
FAULTING_IP:
+0
fffffa80`00c3f000 0f01c1 vmcall
CONTEXT: 04f0a750 -- (.cxr 0xfffff88004f0a750)
eax=009ea180 ebx=00000000 ecx=fffff880 edx=04f0b248 esi=00000001 edi=fffff880
eip=00000000 esp=00000000 ebp=fffff880 iopl=0 nv up di pl nz na po cy
cs=0000 ss=0018 ds=b138 es=0000 fs=2000 gs=0000 efl=00000001
00000000`00000000 ?? ???
Resetting default scope
CPU_COUNT: 2
CPU_MHZ: a81
CPU_VENDOR: GenuineIntel
CPU_FAMILY: 6
CPU_MODEL: 5e
CPU_STEPPING: 3
CPU_MICROCODE: 6,5e,3,0 (F,M,S,R) SIG: 1'00000000 (cache) 1'00000000 (init)
BUGCHECK_STR: 0x3B
PROCESS_NAME: conhost.exe
CURRENT_IRQL: c
ANALYSIS_SESSION_HOST: DESKTOP-B7SL839
ANALYSIS_SESSION_TIME: 07-27-2018 13:54:17.0368
ANALYSIS_VERSION: 10.0.14321.1024 amd64fre
IP_IN_FREE_BLOCK: 0
BAD_STACK_POINTER: 00000000
LAST_CONTROL_TRANSFER: from 00000000 to 00000000
STACK_TEXT:
00000000 00000000 00000000 00000000 00000000 0x0
STACK_COMMAND: kb
SYMBOL_NAME: ANALYSIS_INCONCLUSIVE
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: Unknown_Module
IMAGE_NAME: Unknown_Image
DEBUG_FLR_IMAGE_TIMESTAMP: 0
BUCKET_ID: INVALID_KERNEL_CONTEXT_0x3B
DEFAULT_BUCKET_ID: INVALID_KERNEL_CONTEXT_0x3B
PRIMARY_PROBLEM_CLASS: INVALID_KERNEL_CONTEXT
FAILURE_BUCKET_ID: INVALID_KERNEL_CONTEXT_0x3B
TARGET_TIME: 2018-07-27T11:45:07.000Z
OSBUILD: 7601
OSSERVICEPACK: 1000
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 272
PRODUCT_TYPE: 1
OSPLATFORM_TYPE: x64
OSNAME: Windows 7
OSEDITION: Windows 7 WinNt (Service Pack 1) TerminalServer SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: 2010-11-20 10:30:02
BUILDDATESTAMP_STR: 101119-1850
BUILDLAB_STR: win7sp1_rtm
BUILDOSVER_STR: 6.1.7601.17514.amd64fre.win7sp1_rtm.101119-1850
ANALYSIS_SESSION_ELAPSED_TIME: 40b
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:invalid_kernel_context_0x3b
FAILURE_ID_HASH: {47d39f4d-e8b2-9190-8f3e-f596bd729a8d}
Thank you for the report @trapframe. I wish to support nesting in KVM, I do not have an environment to test. (as far as I know, KVM cannot be run on a Windows host.)
@wbenny you did some experiment with HyperPlatform on KVM and saw consistent bug check, right? Do you have any thoughts on what this issue is, and how we could resolve it?
@tandasat Actually, I didn't get any bugcheck, I just got frozen VM or worse, the whole host got frozen or got rebooted :) But it was FU_Hypervisor.
Anyway, undefined-instruction trap at vmcall
makes me think, that @trapframe runs HyperPlatform on KVM with enabled Hyper-V Enlightenments - this in turn might result in "seemingly random" vmcall
s. These vmcall
s are supposed to actually call KVM, but since HyperPlatform is installed, it traps into that - and if HyperPlatform encounters unsupported hypercall number (ecx), it injects undefined instruction. Which is exactly my guess at what has happened.
@trapframe , can you, please, provide parameters with which you run that KVM instance? I've had similar problem when I ran it with Hyper-V Enlightenments support turned on (hv_*
parameters, e.g. hv_spinlock
). If that's your case, try to run the KVM instance without any hv_*
parameters.
Also, I would be interested in REAL value of ecx/rcx, since ecx=fffff880
looks like a nonsense. The whole context actually looks like nonsense, because it's 32-bit. I would expect 64-bit context.
Edit: oh, the !analyze -v
record is all screaming with INVALID_KERNEL_CONTEXT
. How could I miss that :)
Tiips to disable HyperV feature for virt-manager created Vms :
sudo virt-xml $VMNAME --edit --features hyperv_relaxed=off,hyperv_vapic=off,hyperv_spinlocks=off
sudo virt-xml $VMNAME --edit --clock hypervclock_present=no
Happy that problem was resolved so quickly! Thank you @wbenny and @trapframe for the lightning fast comments and testing.
Nice! Glad it helped. I'll add a small personal observation from my experiments: in my case removing "hyperv_spinlocks" was enough to get rid of the buggy behavior. Leaving the rest wasn't problem and I would suspect it might be beneficial to leave them there for performance reasons.