asamy/ksm

PC just freezes after starting ksm_um.exe

no-realm opened this issue · 61 comments

Type of this issue (please specify)

  • This is a bug in the upstream tree as-is unmodified. (only added debug print messages)

System information

  1. CPU: Intel i7 3770K (Codename: Ivy Bridge)
  2. Kernel: NT Kernel (ntkrnlmp.exe)
  3. Kernel version: 10.0.14393 Build number: ?
  4. OS: Windows 10 x64

Build Configuration

  • ENABLE_DBGPRINT
  • ENABLE_FILEPRINT
  • EPAGE_HOOK
  • INTROSPECT_ENGINE
  • PMEM_SANDBOX
  • ENABLE_RESUBV

I have to mention that this also happens when I just enable EPAGE_HOOK

Issue description

My issues is, that after I start the user mode application (ksm_um.exe), my PC just freezes.
I have waited up to about 10 minutes without anything happening. No crash, nothing.
The last log entry in the log file is right before the DPC call. After that, nothing gets through.
When I tested other hypervisors, I always got some kind of feedback (good or bad) like a BSOD which would help to track down the issue.

asamy commented

Works fine for me with same build configuration. I have Windows 10 build 15063.296 however, CPU configuration shouldn't really be an issue.

The initialization of EPAGE_HOOK should never cause this issue as it's just a spin lock init, plus an identity hash table initialization, are you calling ksm_hook_epage ?

No, like I mentioned. It freezes my PC right after executing the user mode application.
I also just tried it in a VM and it did not have any issues there :/

Maybe I haven't expressed myself clearly enough: The driver itself (sc start) loads just fine, but after the UM application sends the 'subvert' command, the pc freezes.

The annoying thing is that I don't even have a dump file or something similar to look though.
I will try limiting my CPUs and/or RAM amount. Maybe that will help.

asamy commented

Yeah, I understand. I am saying since it's only initialization of epage hook that is the suspect then it shouldn't really cause the issue. If it really is epage that's causing the issue, then can you modify do_ept_violation (vcpu.c), to look like this:

static bool do_ept_violation(struct ept_ve_around *ve)
{
	struct vcpu *vcpu = ve->vcpu;
	struct ept *ept = &vcpu->ept;
	struct ksm *k = vcpu_to_ksm(vcpu);
	struct ve_except_info *info = ve->info;

	if ((info->exit & EPT_VE_RWX) == 0) {	/* no access  */
		if (!ept_alloc_page(EPT4(ept, info->eptp),
				    EPT_ACCESS_ALL, info->gpa, info->gpa))
			return false;

		return true;
	}

	KSM_PANIC(EPT_BUGCHECK_CODE, EPT_UNHANDLED_VIOLATION, info->exit, info->gpa);
	return false;
}

And let me know if a bugcheck occurs then? Also, since you enabled fileprint, post the log too.

I don't think it has anything to do with epage hook.
I didn't try without it, since I didn't even come to a point where that would be relevant.
But as you suggested, I tried the modification you posted, but my PC still just froze.
No BSOD/Bugcheck.

The log isn't very eventful, but here it is:

ksm: CPU 1: check_dynamic_pgtables: PXE: FFFF8BC5E2F17000 PPE FFFF8BC5E2E00000 PDE FFFF8BC5C0000000 PTE FFFF8B8000000000
ksm: CPU 4: check_dynamic_pgtables: Addr 0x22DC7F3C 0x22C00F3C
ksm: CPU 4: DriverEntry: We're mapped at FFFFF800A13C0000 (size: 65536 bytes (64 KB), on 16 pages)
ksm: CPU 4: ksm_init: EPT/VPID caps: 0x00000F0106114141
ksm: CPU 4: ksm_init: 8 physical memory ranges
ksm: CPU 0: ksm_init: Range: 0x0000000000001000 -> 0x000000000009F000
ksm: CPU 0: ksm_init: Range: 0x0000000000100000 -> 0x0000000020000000
ksm: CPU 0: ksm_init: Range: 0x0000000020200000 -> 0x0000000040004000
ksm: CPU 4: ksm_init: Range: 0x0000000040005000 -> 0x00000000C8F1C000
ksm: CPU 4: ksm_init: Range: 0x00000000C9848000 -> 0x00000000C98D4000
ksm: CPU 4: ksm_init: Range: 0x00000000CA177000 -> 0x00000000CA178000
ksm: CPU 4: ksm_init: Range: 0x00000000CA1BB000 -> 0x00000000CABF6000
ksm: CPU 2: ksm_init: Range: 0x00000000CAFF2000 -> 0x00000000CB000000
ksm: CPU 0: DriverEntry: ready
ksm: CPU 6: DriverEntry: ret: 0x00000000
ksm: CPU 0: DriverDispatch: open from ksm_um.exe
ksm: CPU 3: DriverDispatch: ksm_um.exe: IOCTL: 0x8008E008 of length: 0

Edit:
Like I mentioned above, I tested it with limited ram and/or cpus, and the results are a bit surprising.
The cpu count doesn't seem to influence whether it freezes or not, but when I limited my ram amount to 4GB, it didn't freeze...
After a couple minutes, a BSOD popped up though. The same as mentioned in #19 but without sandboxing any process. DRIVER_IRQL_NOT_LESS_OR_EQUAL

asamy commented

Weird about the memory issue, 8 physical memory ranges shouldn't be too much for it to handle. I don't know what the closest possible cause is, if it'd be too much for NonPagedPool then it'd have crashed, I think.
Can you upload the minidump, ksm.sys, ksm.pdb and ntoskrnl.exe?

You mean the minidump for the 4GB RAM crash?
Edit: I am trying to reproduce the bsod crash right now, but it hasn't crashed for now.
Edit2: It's getting even weirder now. Just now my system just restarted. No BSOD or minidump.
Edit3: This time it just froze about one minute after successfully subverting the cpus. No crash here too.

Now I finally got a crash and a dump file that can be examined.
Here are the relevant files:
files.zip
I have included the ksm log, minidump, ksm.sys, ksm.pdb and ntoskrnl.exe.

Edit: This here may actually be the right kernel file. At least WinDbg says it is the mapped one.
ntkrnlmp.zip

asamy commented

So seems to be an LIDT problem:

6: kd> u 0xffff9c0d`23a020bc
ffff9c0d`23a020bc 0f23f8          mov     dr7,rax
ffff9c0d`23a020bf 0f019d78040000  lidt    tbyte ptr [rbp+478h]  # faulting PC
ffff9c0d`23a020c6 ebe3            jmp     ffff9c0d`23a020ab
ffff9c0d`23a020c8 33c0            xor     eax,eax
ffff9c0d`23a020ca 41bb20000000    mov     r11d,20h
ffff9c0d`23a020d0 41898424e0050000 mov     dword ptr [r12+5E0h],eax
ffff9c0d`23a020d8 41f78424d006000000000040 test dword ptr [r12+6D0h],40000000h
ffff9c0d`23a020e4 48bee8b4c891583fa0a3 mov rsi,0A3A03F5891C8B4E8h

This happens when KSM fails to read from guest VA and therefore injects a page fault, I think it might be a fault on how KSM reads from guest virtual memory. I will take a look at this again in a few minutes but in the meantime... can you try without commit c956379?

So, I have tried it without that commit and got a crash.
Here are the files:
ksm.zip
NT kernel should be the same.

asamy commented

Commit 172ca1f should do it.

ok, I will try

So, I have tried the latest commit but had no luck until now.
I can't get Windows to crash, it just freezes, so there is no dump for now..

asamy commented

Does it freeze when coming back from a sleep state? I wonder if you can let a VM freeze while having a debugger attached then break when it hangs to see where it's hanging at.

I would have to set up a new VM since I somehow corrupted my old one when experimenting with hypervisors.
I will report when I have tested it in a VM. I would have to either use the release build or use a static c-runtime for the debug build to run it in the VM though.

And I also don't really know how to debug a VM :/

asamy commented

Just use the release build for the UM application, and install the VC runtime for it. You can keep the debug build of the driver.

See virtual KD it makes it easier with VMWare: http://virtualkd.sysprogs.org/

Ugh, when I try to subvert the cpus, I get the error code ret: 0xC0000024.
Edit: never mind, it is a bit late for me ^^

Edit2: So, I have got it running in a VM now and WinDbg is attached to it.
I couldn't reproduce the neither the freezing or the crashing yet though.

The VM seems to be stable ..?
In the VM I use 4GB RAM and 4 CPUs, where on my host machine I have 16GB RAM and 8 CPUs.
When subverting the CPUs on my host machine, it still freezes 😢 Everything above 4GB RAM seems to cause a freeze and running the PC with just 4GB RAM seems a bit unacceptable to me.
I don't know where it hangs since I can't, or at least I don't know how to, debug the driver on my host machine.
I can't seem to be able to replicate the freezing on my VM with more than 4GB RAM though.

asamy commented

What if you reduce EPTPs, change EPTP_INIT_USED (ksm.h) to 1, see if that works for you.

So, just tried reducing EPTP_INIT_USED to 1, but my system still freezes. I honestly don't understand why 😢

Well, this is interesting.
I tried using the v1.4 release and it actually successfully subverted all CPUs on my main machine ..

Edit: Ah, and also, my VS complains about the ksm.inf file.
Something about ´[DestinationDirs]and[SourceDisksFiles]` being defined twice.
I just moved the second definition to the first ones.

asamy commented

Hmm, that's odd. Do you mind bisecting? Since I don't get this on my machine.

Sorry for my dumb question, what do you mean with 'bisecting' ?

asamy commented

I mean using git bisect. This will help find the offending commit since v1.4.

Ah, ok sure. Just give me a bit of time :)

So, this is what git bisect gave me:

76543619c9c475c4cec7f50dd1890680a065e722 is the first bad commit
commit 76543619c9c475c4cec7f50dd1890680a065e722
Author: Ahmed Samy <asamy@protonmail.com>
Date:   Fri Jan 6 15:25:32 2017 +0200

    Sandbox: new module (early-stage) and lots of fixes

    There is a lot of work to do for this to be fully functional, but I decided to
    commit it for those interested.

    Also cleaned up a lot of stuff, most importantly:
            1. Host/Origin cr3 usage for Windows
            2. EPTP list usage

    Now it's a lot more cleaner for allocating/using EPT pointers.

    And now subverting is up to a user mode process instead of automatic on driver
    load.

    Signed-off-by: Ahmed Samy <asamy@protonmail.com>

:040000 040000 794a6084883245576814516202c2a4c9c0ee892c 1ecd69e768a3c03b6245dc8a9870a401b3d50ed6 M     Documentation
:100644 100644 3a98b0e4e2e96aa913dff9f051f41752c9ad3a6d 04b52817c10cf8c3789dc6fef3c813eec6eff2ed M      Makefile
:100644 100644 595742ea741489e697505631a556260061c259d0 61cce6919a93da16476e5b24cb00c372a7aa114a M  Makefile.windows
:100644 100644 db8fb3a405a616b86dc6eb263b99ce724708ca66 95ee1b9640bf9afa0fcf5ee68c8717ecaf844616 M  bitmap.h
:100644 100644 79bb2839d07f6c749130bf67400b6f5eabdef708 a28dd61770b99ee40864a854ce9d7c8b77d36ec0 M  compiler.h
:100644 100644 bce8eb297c357cf43b8e0a789cd6bb34d003fc82 a3dae2aba2288b171a8dd7f5dc323670b903aa59 M      exit.c
:100644 100644 918a10dcb338f0278f27b045ab961d7dcc6ee052 e822d793e9b8b5ac74659a13a698aa56446a1a43 M  htable.c
:100644 100644 65fce8a50fb49d224fc98f25ecc28b05cb0c1345 7021c1f902a24ff2584cfa6d0b021bfdd5a18638 M  ksm.c
:100644 100644 7c76cee3d722ff54f875054ade0d8890a57334fa 70c03a5ae1ef32522243ea8580d1e96024f39ec8 M  ksm.h
:040000 040000 d6dd0dcc91c524948f0b577c75186c24e403481d 1527bf473e937d33e89d7057f8c484d4e6414050 M      ksm
:000000 100644 0000000000000000000000000000000000000000 e1112da3a00d8799f43f72a803f07b49d1dc785a A  list.h
:100644 100644 9bab149c759da89c555470836009e2de8d4a3518 dc3c3b2d8e235417362b2a44dad22cbc70b2b087 M  main_linux.c
:100644 100644 3e715b05526805007aed3961c377b8219e3cec53 c74bd9b90cd5c47718564981b887ffac78be5cd6 M  main_nt.c
:100644 100644 93427a165bcf7d5dc1a8906db928e5b65a008502 0794a5cad6098349ce3a1f503fe4e6902d8164e4 M      mm.c
:100644 100644 1eeeae6865d32756e68c44b95f166be2f6e63934 5685d1cf1acecac247036a78d2aad5aff62f9916 M  mm.h
:000000 100644 0000000000000000000000000000000000000000 70dfaf0e2e6ca88cdbc6dd14ccc7f215e0275bd6 A  sandbox.c
:000000 040000 0000000000000000000000000000000000000000 131c1d15c7f59e2472478cdcc9b4da2306be3983 A  um
:100644 100644 df43559b6652bd9190059f406a558dad1d93b0ce 6bf1050d63156198dd0484680d4f2f6b451b9678 M      vcpu.c

I have to mention that this commit here

# good: [d00a6657dc0549c8b0f1abc9eef8ee758533dca5] Documentation/CONTRIBUTIONS: fix intendation of code example

gave me two crashes (for which I don't have dump files) but worked after that weirdly enough.

I don't know whether this has anything to do with the freezing, but the first bad commit mentioned above, is the first commit that has the user mode application.

asamy commented

So to confirm, 7654361 causes a crash/freeze? Can you upload the minidump and ksm then?

It causes a freeze, but no crash, so I don't have a crash dump.

asamy commented

See if latest commit fixes the bug.

Ok, one sec.

Nope, sadly not. it still freezes.

I am currently trying to track down the issue by manually causing crashes.
It's a stupid and slow method, by for now the only one :/
I will report if I find anything.

So, this is weird.
It never seems to leave the percpu dpc callback __percpu_##name (in percpu.h).
But it executes the KeSignalCallDpcDone(sys0); line.

Look at this:

#define DEFINE_DPC(name, call, ...)	\
	VOID __percpu_##name(PRKDPC dpc, void *ctx, void *sys0, void *sys1)	\
	{									\
		UNREFERENCED_PARAMETER(dpc);					\
		__g_dpc_logical_rval |= (call) (__VA_ARGS__);			\
		KeSignalCallDpcSynchronize(sys1);				\
		KeSignalCallDpcDone(sys0);					\
                KSM_PANIC(0, 0, 0, 0xFFF);  // this causes a crash   \
	}

#define CALL_DPC(name, ...) do {						\
	__g_dpc_logical_rval = 0;						\
	KeGenericCallDpc(__percpu_##name, __VA_ARGS__);				\
	KSM_PANIC(0, 0, 0, 0xFF);  // this does not..    \
} while (0)
asamy commented

That's the virtualization probe callback. This is the call tree:

ksm_subvert()
     __percpu___call_init()
         __ksm_init_cpu()
              vcpu_init
                   init_ept # this causes the infinite loop for sure.
                       __vmx_vminit

So if you want to investigate then looking at init_ept is the right path here. I'd also have a breakpoint after __vmx_vmlaunch call. I'd also pass the return value of (call) in the percpu callback to KSM_PANIC.

It seems like all 8 CPUs get virtualized successfully.
Or at least the __ksm_init_cpu function for all 8 CPUs completes without any errors.
I have put this right before the return 0; statement in the __ksm_init_cpu function:

if (k->active_vcpus >= 8)
		return k->active_vcpus;

And then in the percpu dpc callback:

#define DEFINE_DPC(name, call, ...)	\
	VOID __percpu_##name(PRKDPC dpc, void *ctx, void *sys0, void *sys1)	\
	{									\
		UNREFERENCED_PARAMETER(dpc);					\
		__g_dpc_logical_rval |= (call) (__VA_ARGS__);			\
		KeSignalCallDpcSynchronize(sys1);				\
		KeSignalCallDpcDone(sys0);					\
		if (DPC_RET() != 0) \
			KSM_PANIC(DPC_RET(), 0, 0, 0xFFF); \
	}

The crash dump shows that 8 virtual CPUs were initialized BugCheck E2, {8, 0, 0, fff}.

Now I am a bit baffled...
I put this at the beginning of the __ksm_init_cpu function:

	if (k->active_vcpus >= 8)
		return k->active_vcpus;

And this for the callback:

		if (DPC_RET() > 8) \
			KSM_PANIC(DPC_RET(), 0, 0, 0xFFF); \

If I understand it right, this should never catch?
But the crash dump shows this: BugCheck E2, {f, 0, 0, fff}

Your latest commit (d650239) with the bitmap fix causes an DRIVER_IRQL_NOT_LESS_OR_EQUAL for me.
This is what WinDbg gives me:

FAULTING_SOURCE_CODE:  
    40: 	unsigned long name[DIV_ROUND_UP(bits, BITMAP_BITS)]
    41: 
    42: static inline void set_bit(unsigned long nr, unsigned long *bmp)
    43: {
>   44: 	bmp[BIT_WORD(nr)] |= BIT_MASK(nr);
    45: }
    46: 
    47: static inline void clear_bit(unsigned long nr, unsigned long *bmp)
    48: {
    49: 	bmp[BIT_WORD(nr)] &= ~BIT_MASK(nr);

Your new commit (b895792) seems to either fix or break something.
Compiling it without any changes resulted in a crash.
After looking at the crash dump, I decided to revert your changes in the previous commit (d650239), since it crashed in the test_bit (in bitmap.h) function, which got called after __vmx_vminit returned something smaller than 0.
After I did that, it didn't crash, but the virtualization failed with error code 7.
This is the DebugView log:

00000001	0.00000000	ksm: CPU 0: check_dynamic_pgtables: PXE: FFFFBF5FAFD7E000 PPE FFFFBF5FAFC00000 PDE FFFFBF5F80000000 PTE FFFFBF0000000000
00000002	0.00000322	ksm: CPU 0: check_dynamic_pgtables: Addr 0x22DB8F3C 0x22C00F3C
00000003	0.00001463	ksm: CPU 0: DriverEntry: We're mapped at FFFFF800CE9A0000 (size: 81920 bytes (80 KB), on 20 pages)
00000004	0.00012668	ksm: CPU 0: ksm_init: EPT/VPID caps: 0x00000F0106114141
00000005	0.00012902	ksm: CPU 0: ksm_init: 8 physical memory ranges
00000006	0.00013107	ksm: CPU 0: ksm_init: Range: 0x0000000000001000 -> 0x000000000009F000
00000007	0.00013283	ksm: CPU 0: ksm_init: Range: 0x0000000000100000 -> 0x0000000020000000
00000008	0.00013487	ksm: CPU 0: ksm_init: Range: 0x0000000020200000 -> 0x0000000040004000
00000009	0.00013663	ksm: CPU 0: ksm_init: Range: 0x0000000040005000 -> 0x00000000C8F1C000
00000010	0.00013839	ksm: CPU 0: ksm_init: Range: 0x00000000C9848000 -> 0x00000000C98D4000
00000011	0.00014014	ksm: CPU 0: ksm_init: Range: 0x00000000CA177000 -> 0x00000000CA178000
00000012	0.00014190	ksm: CPU 0: ksm_init: Range: 0x00000000CA1BB000 -> 0x00000000CABF6000
00000013	0.00014394	ksm: CPU 0: ksm_init: Range: 0x00000000CAFF2000 -> 0x00000000CB000000
00000014	0.00017262	ksm: CPU 0: DriverEntry: ready
00000015	0.00017437	ksm: CPU 0: DriverEntry: ret: 0x00000000
00000016	6.82350826	ksm: CPU 4: DriverDispatch: open from ksm_um.exe
00000017	6.82351542	ksm: CPU 4: DriverDispatch: ksm_um.exe: IOCTL: 0x8008E008 of length: 0
00000018	6.96509171	ksm: CPU 0: vcpu_run: 1: something went wrong: 7
00000019	6.96509743	ksm: CPU 0: __ksm_init_cpu: ksm_um.exe: Started: 0
00000020	6.96546650	ksm: CPU 1: vcpu_run: 1: something went wrong: 7
00000021	6.96547031	ksm: CPU 1: __ksm_init_cpu: System: Started: 0
00000022	6.96717167	ksm: CPU 6: vcpu_run: 1: something went wrong: 7
00000023	6.96717501	ksm: CPU 6: __ksm_init_cpu: System: Started: 0
00000024	6.96732521	ksm: CPU 2: vcpu_run: 1: something went wrong: 7
00000025	6.96732807	ksm: CPU 2: __ksm_init_cpu: System: Started: 0
00000026	6.96743393	ksm: CPU 7: vcpu_run: 1: something went wrong: 7
00000027	6.96743679	ksm: CPU 7: __ksm_init_cpu: System: Started: 0
00000028	6.96744967	ksm: CPU 4: vcpu_run: 1: something went wrong: 7
00000029	6.96745253	ksm: CPU 4: __ksm_init_cpu: System: Started: 0
00000030	6.96762228	ksm: CPU 5: vcpu_run: 1: something went wrong: 7
00000031	6.96762514	ksm: CPU 5: __ksm_init_cpu: System: Started: 0
00000032	6.96768761	ksm: CPU 3: vcpu_run: 1: something went wrong: 7
00000033	6.96769142	ksm: CPU 3: __ksm_init_cpu: System: Started: 0
00000034	6.99793959	ksm: CPU 0: DriverDispatch: close from ksm_um.exe
asamy commented

Latest should fix bitmap for once and all.

Is it really supposed to be:

static inline unsigned long __ffs(unsigned long x)
{
#ifdef _MSC_VER
	unsigned long i;
	_BitScanForward(&i, x);
	return i + 1;
#else
	return __builtin_ffs(x);
#endif
}

I am asking, because in this commit (fa654b6), you removed the addition, but now in your latest commit (07e4334), you added it again?

With the addition, I am getting a freeze (like always) and without, __vmx_vmlaunch fails with vm error code 7.

asamy commented

Commit 07e4334 is all the bitmap commits merged into one, because it was a mess. That's the final version.

_BitScanForward returns the index of the first found bit, not the number, so we add 1 to sync with __builtin_ffs (Linux/MinGW or GCC in general).

So this basically explains why we need to decrement eptp after the return of find_first_zero_bit, because we want index not number (hence set_bit, EPTP_*, etc.).

I don't see why it would fail with that error code (invalid control field) which I'd assume is the EPT_POINTER control field, maybe add debug in ept_create_ptr etc, to see if really works, but for me I tested the bitmap and it works just fine.

Or just add a function to tell you which __vmx_vmwrite failed, something like:

static inline u8 debug_vmx_vmwrite(const char *field, size_t nr, size_t value)
{
      u8 err = __vmx_vmwrite(nr, value);
      if (err != 0)
           KSM_DEBUG("error writing 0x%016llX to %s: %d\n", value, field, err);
      return err;
}

#define DEBUG_VMX_VMWRITE(field, value) \
      debug_vmx_vmwrite(#field, field, value)

And replace __vmx_vmwrite calls from vcpu_run to DEBUG_VMX_VMWRITE.

asamy commented

May have been an MTRR issue, can you re-try with latest?

Ok, I will when I come home.

Sorry for the late response.
Anyway, I tested it with the latest commit (d5b617e), but the issue persists.
It still freezes after I start subverting the CPUs.

I git bisected again, and this time I got a different result.
The first bad commit is 350a34d.

asamy commented

I don't think that's relevant, the bug is in some other commit for sure, keep bisecting.

Check your Windows '__ffs64' implementation. The first argument to _BitScanForward64 is the wrong size.

asamy commented

Thanks! Should be fixed now.

Was this bug fixed? I've encountered the same problem. Every time I run ksm_um.exe, the system freezed even when I remove all the macro other than ENABLE_DBGPRINT. OS:windows 10 x64, CPU:i5 7200U Kernel:16299.

asamy commented

No, it wasn't.

I can't reproduce, if you can find which part is broken, I will look into it.
You can do that by disabling features until you find what could be wrong. See comments in this issue for what and where.

Tried several combinations ,finally found:

  1. comment SECONDARY_EXEC_ENABLE_VMFUNC | SECONDARY_EXEC_ENABLE_VE in vcpu.c
  2. change err |= vmcs_write(GUEST_IDTR_BASE, idtr->base); in vcpu.c

then run ksm_um.exe, no frozen or crash. but after a while, system hang too.

asamy commented

What did you change the GUEST_IDTR_BASE to?

Also, if you're gonna change the IDTR, then comment out SECONDARY_EXEC_DESC_TABLE_EXITING.
See if that solves the hanging problem.

change from vcpu->idt.base to idtr->base.
Tried you proposal, seems it doesn't hang now.

but if I disable SECONDARY_EXEC_ENABLE_VMFUNC, it seems I can't use the ept related tricks(ept hook, introspect...)

asamy commented

Yep, that's because they require vmfunc. You can use VMFUNC without #VE.

However, if you want to fix that freeze/crash, can you try backporting #VE handling and IDT shadowing to how v1.4 does them? v1.4 is commit 0cb7dd5.

I revert to ksm-1.4, comment #VE and SECONDARY_EXEC_DESC_TABLE_EXITING, change GUESTR_IDTR_BASE to idtr->base, the epage_hook works fine.

Hope the current version has this bug fixed.

asamy commented

I don't understand what you changed from the current modifications you already made anyways.

There is a reason why I told you to only backport these specific parts, v1.4 is v1.4.

If you don't want to work on fixing it or help to fix it, then it won't be fixed any time soon.

hzqst commented

@asamy
I am able to repro this issue and fixed by commenting out the ShadowIDT feature.
That's because the shadowed idtr.base you allocated by yourself is not mapped by EPROCESS:::UserDirectoryBase that means the guest OS is no way to access idtr.base in this situation, whch lead to infinite PageFault (also the PageFault entry is not available).
This occurs only when KPTI/KvaShadow is enabled for OS.

asamy commented

@hzqst That's a different issue, pretty sure this was posted before KPTI was even discovered and reported. What you reported, can be related, however. So can you open an issue with that? I can look into this later.

I'm getting SYSTEM_THREAD_EXCEPTION_NOT_HANDLED bluescreen both when I try the epage hook example in the driver or if I try your test binary. I'm win10 1709 and latest gen i5. This happens both in vmware 14 and on normal windows. Also I have cpu overclocking disabled. Here's a memory dump of the epage hook test

111818-10812-01.zip

I have found this problem is caused by : mov cr8, rax, in nt!KiGenericCallDpcWorker+0x111: win10 1903 , but i do not know why