tandasat/DdiMon

DPC_WATCHDOG_VIOLATION from Shadow Hooks

frostiest opened this issue · 8 comments

I'm dealing with a similar issue to this thread #26

as in security software is freezing my system on load and eventually issuing a DPC_WATCHDOG_VIOLATION. I applied all the changes from that thread but nothing is helping. I have confirmed the issue only arises when I hook something, otherwise no freeze.

I'm using latest ddimon and here's the memory.dmp
`
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: fffff8045cd73358, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding this single DPC timeout

Debugging Details:




*** Either you specified an unqualified symbol, or your debugger ***
*** doesn't have full symbol information. Unqualified symbol ***
*** resolution is turned off by default. Please either specify a ***
*** fully qualified symbol module!symbolname, or enable resolution ***
*** of unqualified symbols by typing ".symopt- 100". Note that ***
*** enabling unqualified symbol resolution with network symbol ***
*** server shares in the symbol path may cause the debugger to ***
*** appear to hang for long periods of time when an incorrect ***
*** symbol name is typed or the network symbol server is down. ***


*** For some commands to work properly, your symbol path ***
*** must point to .pdb files that have full type information. ***


*** Certain .pdb files (such as the public OS symbols) do not ***
*** contain the required information. Contact the group that ***
*** provided you with these symbols if you need this command to ***
*** work. ***


*** Type referenced: TickPeriods ***



KEY_VALUES_STRING: 1

PROCESSES_ANALYSIS: 1

SERVICE_ANALYSIS: 1

STACKHASH_ANALYSIS: 1

TIMELINE_ANALYSIS: 1

DUMP_CLASS: 1

DUMP_QUALIFIER: 402

BUILD_VERSION_STRING:

SYSTEM_PRODUCT_NAME: To Be Filled By O.E.M.

SYSTEM_SKU: To Be Filled By O.E.M.

SYSTEM_VERSION: To Be Filled By O.E.M.

BIOS_VENDOR: American Megatrends Inc.

BIOS_VERSION: P3.10

BIOS_DATE: 07/04/2018

BASEBOARD_MANUFACTURER:

BASEBOARD_PRODUCT:

BASEBOARD_VERSION:

DUMP_TYPE: 0

BUGCHECK_P1: 0

BUGCHECK_P2: 501

BUGCHECK_P3: 500

BUGCHECK_P4: fffff8045cd73358

DPC_TIMEOUT_TYPE: SINGLE_DPC_TIMEOUT_EXCEEDED

CPU_COUNT: 6

CPU_MHZ: e10

CPU_VENDOR: GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 9e

CPU_STEPPING: a

CPU_MICROCODE: 6,9e,a,0 (F,M,S,R) SIG: 96'00000000 (cache) 96'00000000 (init)

BLACKBOXBSD: 1 (!blackboxbsd)

BLACKBOXNTFS: 1 (!blackboxntfs)

BLACKBOXWINLOGON: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: 0x133

PROCESS_NAME: System

CURRENT_IRQL: d

ANALYSIS_SESSION_HOST:

ANALYSIS_SESSION_TIME: 12-15-2019 22:05:37.0391

ANALYSIS_VERSION: 10.0.18362.1 amd64fre

LAST_CONTROL_TRANSFER: from fffff8045c9ee83d to fffff8045c9c1220

STACK_TEXT:
ffffbc01879bcb08 fffff8045c9ee83d : 0000000000000133 0000000000000000 0000000000000501 0000000000000500 : nt!KeBugCheckEx
ffffbc01879bcb10 fffff8045c81f857 : 0000014240a3c9a1 ffffbc01879c0180 0000000000000286 fffffd0a80837a10 : nt!KeAccumulateTicks+0x1cbe1d
ffffbc01879bcb70 fffff8045d2b91e1 : 0000000000000000 ffffd108652d4400 fffffd0a80837a90 ffffd108652d44b0 : nt!KeClockInterruptNotify+0xc07
ffffbc01879bcf30 fffff8045c8029e5 : ffffd108652d4400 0000000000000000 0000000000000000 ffff1d2c7670378f : hal!HalpTimerClockIpiRoutine+0x21
ffffbc01879bcf60 fffff8045c9c2cba : fffffd0a80837a90 ffffd108652d4400 000000000000fffe ffffd108652d4400 : nt!KiCallInterruptServiceRoutine+0xa5
ffffbc01879bcfb0 fffff8045c9c3227 : ffffbc0188259ef0 ffffbc0188259ef0 000000000000002a 0000000000000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
fffffd0a80837a10 fffff8045c81b9eb : ffffffffffffffd2 fffff8045cdab573 0000000000000010 0000000000000286 : nt!KiInterruptDispatchNoLockNoEtw+0x37
fffffd0a80837ba0 fffff8045cdab585 : ffffd108654fdfa0 ffffd108653f7840 0000000000000010 ffffd108654fd000 : nt!KeYieldProcessorEx+0x1b
fffffd0a80837bd0 fffff8045cdaa6e5 : 000000001eb9414f ffffbc01879c0180 ffffd10879a92a90 fffffd0a80837c90 : nt!IopLiveDumpProcessCorralStateChange+0x2d
fffffd0a80837c00 fffff8045c86ae85 : ffffbc01879c2f80 ffffd108653f6000 ffffd108673cf260 ffffbc0100000002 : nt!IopLiveDumpCorralDpc+0x55
fffffd0a80837c40 fffff8045c86a4df : ffffbc01879c0180 0000000000000000 0000000000000002 0000000000000004 : nt!KiExecuteAllDpcs+0x305
fffffd0a80837d80 fffff8045c9c8265 : 0c45b60f450d55b6 ffffbc01879c0180 0000000000000000 00000000c0000002 : nt!KiRetireDpcList+0x1ef
fffffd0a80837fb0 fffff8045c9c8050 : 0000000000000000 ffffd108672171e0 0000000000000000 fffff8045cc68c80 : nt!KxRetireDpcList+0x5
fffffd0a87966b20 fffff8045c9c7720 : 0000000000000000 fffffd0a87966bd0 ffffd108652d4400 0000000000000000 : nt!KiDispatchInterruptContinue
fffffd0a87966b50 fffff8045c901064 : ffffd108766c3550 fffff8045e4cc089 ffffa98c2f402a80 fffff8045c8c2cfc : nt!KiDpcInterrupt+0x2f0
fffffd0a87966ce0 fffff8045c8c12ae : fffff8045cc68c80 0000000000000002 ffffbc0187dc7180 0000000000001000 : nt!ExpWaitForSpinLockSharedAndAcquire+0x64
fffffd0a87966d20 fffff8045c855d97 : fffff8045cc68bc0 ffff9bfc02391008 ffffd108766c3601 fffff8045cc68bc0 : nt!MiLockWorkingSetShared+0xee
fffffd0a87966d50 fffff8045ced3bb5 : fffff804722000e0 fffff804722000e0 000000000000000c 0000000000000fff : nt!MiLockCode+0x147
fffffd0a87966ef0 fffff8047220614c : ffffd108766c3550 fffffd0a879670e0 ffffd108673afa40 ffffd108673afa40 : nt!MmResetDriverPaging+0xa5
fffffd0a87966f20 fffff80472206037 : 00000000656c6946 fffff8045cb6f06d fffffd0a879670e0 0000000000000000 : Msfs!MsCommonCreate+0xdc
fffffd0a87966ff0 fffff8045c831f39 : ffffd10865fe3010 ffffd108766c3550 0000000000000000 0000000000000000 : Msfs!MsFsdCreate+0x27
fffffd0a87967020 fffff8045e4fce8c : 0000000000000010 0000000000000000 0000000000000000 0000000000000000 : nt!IofCallDriver+0x59
fffffd0a87967060 fffff8045c831f39 : ffffd1087d5a8100 fffff8045cde590c 0000000000000000 0000000000000030 : FLTMGR!FltpCreate+0x46c
fffffd0a87967110 fffff8045c830fe4 : 0000000000000000 0000000000000000 ffffd108766c36f8 fffff8045c8317a3 : nt!IofCallDriver+0x59
fffffd0a87967150 fffff8045cde5ffb : fffffd0a87967410 fffff8045cde590c fffffd0a87967380 ffffd1087aa4a4e0 : nt!IoCallDriverWithTracing+0x34
fffffd0a879671a0 fffff8045cdecfcf : ffffd108673af8f0 ffffd108673af80c ffffd10877eaf7e0 ffffa98c2fa04000 : nt!IopParseDevice+0x62b
fffffd0a87967310 fffff8045cdeb431 : ffffd10877eaf700 fffffd0a87967558 0000000000000240 ffffd108652f7380 : nt!ObpLookupObjectName+0x78f
fffffd0a879674d0 fffff8045ce30300 : ffffd10800000001 fffffd0a87967998 0000000000000000 ffffbc0187dc7180 : nt!ObOpenObjectByNameEx+0x201
fffffd0a87967610 fffff8045ce2fa38 : fffffd0a879679e0 0000000000000080 fffffd0a87967998 fffffd0a87967988 : nt!IopCreateFile+0x820
fffffd0a879676b0 fffff8045c9d2b15 : ffffd108766c3550 0000000000000002 0000000000000170 0000000000000001 : nt!NtOpenFile+0x58
fffffd0a87967740 fffff8045c9c5060 : fffff8045ce447c4 0000000000000000 0000000000000000 0000000000040282 : nt!KiSystemServiceCopyEnd+0x25
fffffd0a87967948 fffff8045ce447c4 : 0000000000000000 0000000000000000 0000000000040282 fffff8045c869d59 : nt!KiServiceLinkage
fffffd0a87967950 fffff8045e503b6c : 0000000000000000 0000000000000000 0000000000000030 fffff8045e4ea800 : nt!IoGetDeviceObjectPointer+0x94
fffffd0a879679e0 fffff8045c8bd465 : ffffd108652a0500 ffffd10878185040 fffff8045e503aa0 ffffd108652a0500 : FLTMGR!FltpManualDeviceAttachWorker+0xcc
fffffd0a87967a70 fffff8045c92a725 : ffffd10878185040 0000000000000080 ffffd10865294380 000024efbd9bbfff : nt!ExpWorkerThread+0x105
fffffd0a87967b10 fffff8045c9c886a : ffffbc0187e79180 ffffd10878185040 fffff8045c92a6d0 0000000000000000 : nt!PspSystemThreadStartup+0x55
fffffd0a87967b60 0000000000000000 : fffffd0a87968000 fffffd0a87961000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x2a

THREAD_SHA1_HASH_MOD_FUNC: 20df6024a5c03bdd36d1680f5e2bf77cf839246c

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 34208103f2bc76fcde0bfeae8b7994e947ea59f9

THREAD_SHA1_HASH_MOD: 927a0c775c0b5510d0e1cbb5ae141a64633523eb

FOLLOWUP_IP:
Msfs!MsCommonCreate+dc
fffff804`7220614c 33d2 xor edx,edx

FAULT_INSTR_CODE: 8d48d233

SYMBOL_STACK_INDEX: 13

SYMBOL_NAME: Msfs!MsCommonCreate+dc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Msfs

IMAGE_NAME: Msfs.SYS

DEBUG_FLR_IMAGE_TIMESTAMP: 0

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: dc

FAILURE_BUCKET_ID: 0x133_DPC_Msfs!MsCommonCreate

BUCKET_ID: 0x133_DPC_Msfs!MsCommonCreate

PRIMARY_PROBLEM_CLASS: 0x133_DPC_Msfs!MsCommonCreate

TARGET_TIME: 2019-12-14T05:09:53.000Z

OSBUILD: 18362

OSSERVICEPACK: 0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 1

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: unknown_date

BUILDDATESTAMP_STR: 190318-1202

BUILDLAB_STR: 19h1_release

BUILDOSVER_STR: 10.0.18362.1.amd64fre.19h1_release.190318-1202

ANALYSIS_SESSION_ELAPSED_TIME: 19b1

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:0x133_dpc_msfs!mscommoncreate

FAILURE_ID_HASH: {349b4bae-9c52-cd03-915a-31077bf2ad14}

Followup: MachineOwner

`

I been at this a few days so any help would be great

Can you tell me hooking what API causes the issue? I may be able to list few possible causes.

It appears to be any API, doesn't seem to matter which. As I said though if I don't have anything hooked but ddimon/hyperplatform is loaded, there's no freeze. I'm going to continue testing to see if maybe its finding the int 3 at the start of a shadow page but outside of that I don't have any ideas.

Understood. What is the security software you use? Some of them have their own hypervisors and emulates VT-x, but can still cause compatibility issues like that.

If your security software has such a feature and disabling it is a solution, try that out. If your need to understand the root cause, uploading MEMORY.DMP with PDB would be helpful, though I may not be able to take time for much analysis.

If it's all the same to you i'd rather not list the security software, but I can tell you with certainty they don't use a hypervisor because I tested the hypervisor listed here #26 yesterday, hvpp and it appears to work ok. I really rather use your software though, if possible.

Is there any ideas you have I can try or do you need to debug?

A few things i've noticed during testing

  • It doesn't care at all about int 3 on the shadow page, if I call ShpSetupInlineHook and not ShEnableHooks it loads ok. And on the same token if I comment out the int 3 but still call ShEnableHooks it freezes.

  • I enabled kVmmpEnableRecordVmExit but nothing is logged once it freezes, or even right before.

  • I tested on my main computer as well as in vmware. With vmware I used windbg attached ,and the guest system forever freezes once the security driver loads, the debugger doesn't break, and there's no bluescreen.

  • I read in the thread I linked that there's not many difference between hvpp and this project, and hvpp seems to load fine, so whatever is causing this I think maybe targeted at hyperplatform specifically.

I'll keep trying things and share if i find an answer, but at this point am pretty stumped.

Hi, thanks for sharing your experiments and results. I would try studying more about hvpp and see what are possible key differences, and also, try VMware + GDB stub debugging. This would let you to see where is freezing. It is often does not uncover that's wrong but still gives some more information.
https://www.triplefault.io/2017/07/setup-vmm-debugging-using-vmwares-gdb_9.html

Hey tandasat, good news, I finally confirmed the cause. The issue was similar to the other thread, with MmCopyMemory being called for large regions, a lot of times, and being so slow that it causes the DPC_WATCHDOG violation. For this security software though, simply removing the use of MTF for the shadow hooks wasn't enough, I had to hook MmCopyMemory and modify its results, and finally, finally the software seems to load ok. If you know of any other ways to speed that up, that'd be great, but until then i'm going to go ahead and close this thread, thx for the help.

Thank you for sharing your findings. This should help others having similar issue narrow down the cause by themselves.

Apart from not using MTF (IIRC, like how hvpp does it), replacing 0xcc with actual jmp instructions for hooking might help, as it reduces performance penalty associated with extra VM-exit. Relevant discussion can be found here:
tandasat/SimpleSvmHook#1
I do not have performance metrics so do not know if this is the right spot to invest optimization effort for though.