injdrv is a proof-of-concept Windows Driver for injecting DLL into user-mode processes using APC.
Even though APCs are undocumented to decent extent, the technique of using them to inject a DLL into a user-mode process is not new and has been talked through many times. Such APC can be queued from regular user-mode process (seen in Cuckoo) as well as from kernel-mode driver (seen in Blackbone).
Despite its popularity, finding small, easy-to-understand and actually working projects demonstrating usage of this technique isn't very easy. This project tries to fill this gap.
- Support for Windows 7 up to Windows 10
- Support for x86 & x64 architectures
- Ability to inject WoW64 processes
- ...with x86 DLL and even with x64 DLL
- DLL is injected in very early process initialization stage
- Injection is performed from the
PsSetLoadImageNotifyRoutine
callback - Native processes (x86 on Windows x86, x64 on Windows x64) are injected on next load of DLL after
ntdll.dll
- WoW64 processes are injected on next load of DLL after following system DLLs are loaded:
ntdll.dll
(both x64 and WoW64),wow64.dll
,wow64cpu.dll
andwow64win.dll
- Injection is performed from the
- Because of that, injected DLL is dependent only on
ntdll.dll
- Demonstrative DLL performs hooking of few
ntdll.dll
functions- Achieved using DetoursNT
- Detoured functions use
ETW
to inform which hooked function has been called
Because DetoursNT project is attached as a git submodule, which itself carries the Detours git submodule, you must not forget to fetch them:
git clone --recurse-submodules git@github.com:wbenny/injdrv.git
After that, compile this project using Visual Studio 2017. Solution file is included. The only required dependency is WDK.
When the driver is loaded, it'll register two callbacks:
- For process create/exit notification (
PsSetCreateProcessNotifyRoutineEx
) - For image load notification (
PsSetLoadImageNotifyRoutine
)
When a new process is created, the driver allocates small structure, which will hold information relevant to the process injection, such as:
- Which DLLs are already loaded in the process
- Addresses of important functions (such as
LdrLoadDll
inntdll.dll
)
Start of a new Windows process is followed by mapping ntdll.dll
into its address space and then ongoing load of DLLs
from the process's import table. In case of WoW64 processes, the following libraries are loaded immediately after
native ntdll.dll
: wow64.dll
, wow64cpu.dll
, wow64win.dll
and second (WoW64) ntdll.dll
. The driver is notified
about load of these DLLs and marks down this information.
When these DLLs are loaded, it is safe for the driver to queue the user-mode APC to the process, which will load our DLL into the process.
Although such project might seem trivial to implement, there are some obstacles you might be facing along the way. Here I will try to summarize some of them:
-
Injection of x86 (or WoW64) DLL requires a small allocation inside of the user-mode address space. This allocation holds path to the DLL to be injected and a small shellcode, which basically calls
LdrLoadDll
with the DLL path as a parameter. It is obvious that this memory requiresPAGE_EXECUTE_READ
protection, but the driver has to fill this memory somehow - andPAGE_EXECUTE_READWRITE
is unacceptable security concern.It might be tempting to use
ZwAllocateVirtualMemory
andZwProtectVirtualMemory
but unfortunatelly, the second function is exported only since Windows 8.1.The solution used in this driver is to create section (
ZwCreateSection
), map it (ZwMapViewOfSection
) withPAGE_READWRITE
protection, write the data, unmap it (ZwUnmapViewOfSection
) and then map it again withPAGE_EXECUTE_READ
protection. -
With usage of sections another problem arises. Since this driver performs injection from the image load notification callback - which is often called from the
NtMapViewOfSection
function - we'd be callingMapViewOfSection
recursively. This wouldn't be a problem, if mapping of the section wouldn't lock theEPROCESS->AddressCreationLock
. Because of that, we would end up in deadlock.The solution used in this driver is to inject kernel-mode APC first, from which the
ZwMapViewOfSection
is called. This kernel-mode APC is triggered right before the kernel-to-user-mode transition, so the internalNtMapViewOfSection
call won't be on the callstack anymore (and therefore,AddressCreationLock
will be unlocked). -
Injection of our DLL is triggered on first load of DLL which happens after all important system DLLs (mentioned above) are already loaded.
In case of native processes, the codeflow is following:
process.exe
is created (process create notification)process.exe
is loaded (image load notification)ntdll.dll
is loaded (image load notification)kernel32.dll
is loaded (image load notification + injection happens here)
In case of WoW64 processes, the codeflow is following:
process.exe
is created (process create notification)process.exe
is loaded (image load notification)ntdll.dll
is loaded (image load notification)wow64.dll
is loaded (image load notification)wow64cpu.dll
is loaded (image load notification)wow64win.dll
is loaded (image load notification)ntdll.dll
is loaded (image load notification - note, this is 32-bit ntdll.dll)kernel32.dll
is loaded (image load notification + injection happens here)
Note that load of the
kernel32.dll
was used as an example. In fact, load of any DLL will trigger the injection. But in practice,kernel32.dll
is loaded into every Windows process, even if:- it has no import table
- it doesn't depend on
kernel32.dll
- it does depend only on
ntdll.dll
(covered in previous point, I just wanted to make that crystal-clear) - it is a console application
Also note that the order of loaded DLLs mentioned above might not reflect the exact order the OS is performing.
The only processes that won't be injected by this method are:
- native processes (such as
csrss.exe
) - pico processes (such as applications running inside Windows Subsystem for Linux)
Injection of these processes is not in the scope of this project.
The injected user-mode APC is then force-delivered by calling
KeTestAlertThread(UserMode)
. This call internally checks if any user-mode APCs are queued and if so, sets theThread->ApcState.UserApcPending
variable toTRUE
. Because of this, the kernel immediately delivers this user-mode APC (byKiDeliverApc
) on next transition from kernel-mode to user-mode.If we happened to not force the delivery of the APC, the APC would've been delivered when the thread would be in the alertable state. (There are two alertable states per each thread, one for kernel-mode, one for user-mode; this paragraph is talking about
Thread->Alerted[UserMode] == TRUE
.) Luckily, this happens when the Windows loader in thentdll.dll
finishes its job and gives control to the application - particularly by callingNtAlertThread
in theLdrpInitialize
(or_LdrpInitialize
) function. So even if we happened to not force the APC, our DLL would still be injected before the main execution would take place.NOTE: This means that if we wouldn't force delivery of the APC on our own, the APC would be delivered BEFORE the
main
/WinMain
is executed, but AFTER all TLS callbacks are executed. This is because TLS callbacks are executed also in the early process initialization stage, within theLdrpInitialize
function.This behavior is configurable in this project by the
ForceUserApc
variable (by default it'sTRUE
).NOTE: Some badly written drivers try to inject DLL into processes by queuing APC at wrong time. For example:
- Queuing an APC for injecting DLL that doesn't depend only on ntdll.dll right when ntdll.dll is mapped
- Queuing an APC for injecting DLL that depends on kernel32.dll right when kernel32.dll is mapped (but not loaded!)
Such injection will actually work as long as someone won't try to forcefully deliver user-mode APCs. Because this driver triggers immediate deliver of user-mode APCs (all of them, you can't pick which should be delivered), it might happen that APC of other driver will be triggered. If such APC consisted, let's say, of calling
LoadLibraryA
fromkernel32.dll
and thekernel32.dll
won't be fully loaded (just mapped), such APC would fail. And because this injection happens in early process initialization stage, this error would be considered critical and the process start would fail. Also because basically every process is being injected, if start of every process would fail, it would make the system very unusable.The reason why our DLL is not injected immediately from the
ntdll.dll
image load callback is simple: the image load callback is called when the DLL is mapped into the process - and at this stage, the DLL is not fully initialized. The initialization takes place after this callback (in user-mode, obviously). If we would happen to injectLdrLoadDll
call beforentdll.dll
is initialized, the call would fail somewhere in that function, because some variable it relies on would not be initialized. -
Injection of protected processes is simply skipped, as it triggers code-integrity errors. Such processes are detected by the
PsIsProtectedProcess
function. If you're curious about workaround of this issue (by temporarily unprotecting these processes), you can peek into Blackbone source code. Keep in mind that unprotecting protected processes requires manipulation with undocumented structures, which change dramatically between Windows versions. -
Injection of x86 DLL into WoW64 processes is handled via
PsWrapApcWow64Thread(&NormalContext, &NormalRoutine)
call. This function essentially alters provided arguments in a way (not covered here) thatKiUserApcDispatcher
in x64ntdll.dll
is able to recognize and handle such APCs differently. Handling of such APCs is internally resolved by callingWow64ApcRoutine
(fromwow64.dll
). This function then emulates queuing of "32-bit APC" and resumes its execution inKiUserApcDispatcher
in the x86ntdll.dll
. -
Injection of x64 DLL into WoW64 processes is tricky on its own, and SentinelOne wrote an excellent 3-part blogpost series on how to achieve that:
- https://www.sentinelone.com/blog/deep-hooks-monitoring-native-execution-wow64-applications-part-1
- https://www.sentinelone.com/blog/deep-hooks-monitoring-native-execution-wow64-applications-part-2
- https://www.sentinelone.com/blog/deep-hooks-monitoring-native-execution-wow64-applications-part-3
In short, if you try to use the same approach as mentioned above (injecting small stub which calls
LdrLoadDll
) for injecting x64 DLL into WoW64 process, you will run into problems with Control Flow Guard on Windows 10.- On x64 system, CFG maintains 2 bitmaps for WoW64 processes
- One for "x86 address space" (used when checking execution of < 4GB memory)
- One for "x64 address space" (used when checking execution of >= 4 GB memory)
- You cannot allocate memory in > 4GB range (even from the kernel-mode), because of VAD that reserves this
memory range
- You can theoretically unlink such VAD from
EPROCESS->VadRoot
and decrementEPROCESS->VadCount
, but that's highly unrecommended
- You can theoretically unlink such VAD from
- That means, when you allocate memory inside of WoW64 process (even from the kernel-mode) or change its protection, the x86 CFG bitmap is used.
- x64
ntdll.dll
is mapped above 4GB, therefore, theKiUserApcDispatcher
function is also located in > 4GB address. - Before
KiUserApcDispatcher
calls (indirectly) theNormalRoutine
provided to theKeInitializeApc
function, it checks whetherNormalRoutine
can be executed via CFG - Because
KiUserApcDispatcher
is called from > 4GB address, this CFG check is performed on x64 CFG bitmap, but this check will fail, because the allocated memory of ours is in < 4GB memory- You can theoreticaly work around this by disabling the CFG with various hacks, but that's also highly unrecommended
ZwProtectVirtualMemory
and evenZwSetInformationVirtualMemory
won't help you, because these APIs will operate on x86 CFG bitmap as well, if you feed them with < 4GB address
The solution outlined in the SentinelOne blogpost rests in calling
LdrLoadDll
of x64ntdll.dll
directly from the user APC dispatcher - effectively, makingNormalRoutine
point to the address of theLdrLoadDll
. The issue here is thatPKNORMAL_ROUTINE
takes only 3 parameters, whileLdrLoadDll
takes 4.typedef VOID (NTAPI *PKNORMAL_ROUTINE) ( _In_ PVOID NormalContext, _In_ PVOID SystemArgument1, _In_ PVOID SystemArgument2 ); NTSTATUS NTAPI LdrLoadDll ( _In_ PWSTR SearchPath OPTIONAL, _In_ PULONG DllCharacteristics OPTIONAL, _In_ PUNICODE_STRING DllName, _Out_ PVOID *BaseAddress );
Note that 4th parameter of the
LdrLoadDll
must point to some valid address, where theBaseAddress
will be stored. The devil is always in the details - the solution takes advance of "couple of lucky coincidences":-
KiUserApcDispatcher
is a function expectingRSP
to point to theCONTEXT
structure -
From this structure, values
P1Home
...P4Home
are fetched:P1Home
(moved toRCX
) representNormalContext
P2Home
(moved toRDX
) representSystemArgument1
P3Home
(moved toR8
) representSystemArgument2
P4Home
(moved toRAX
) representNormalRoutine
- Also,
R9
is set to point to theRSP
(theCONTEXT
structure) - Note that
RCX
,RDX
,R8
andR9
are used as first four function parameters in Microsoft x64 calling convention
-
KiUserApcDispatcher
callsKiUserCallForwarder
KiUserCallForwarder
checks whetherRAX
points to valid execution target (in x64 CFG bitmap)KiUserCallForwarder
calls function pointed byRAX
with parametersRCX
,RDX
,R8
andR9
- This is basically equivalent of calling APC's
PKNORMAL_ROUTINE
NormalRoutine(NormalContext, SystemArgument1, SystemArgument2)
- ...except that, because
R9
is set, it is in fact called like this:NormalRoutine(NormalContext, SystemArgument1, SystemArgument2, ContinueContext)
-
Therefore, if we queue the user-mode APC like this:
NormalRoutine
= address ofLdrLoadDll
in 64-bitntdll.dll
NormalContext
=NULL
(translates to 1st param. ofLdrLoadDll
(SearchPath
))SystemArgument1
=NULL
(translates to 2nd param. ofLdrLoadDll
(DllCharacteristics
))SystemArgument2
= pointer toUNICODE_STRING DllName
(translates to 3rd param. ofLdrLoadDll
(DllName
))- (as mentioned above, the 4th parameter (
BaseAddress
) will be provided automatically by theKiUserApcDispatcher
)
-
...it will effectively result in the following call:
LdrLoadDll(NULL, 0, &DllName, &ContinueContext)
-
LdrLoadDll
overwrites first 8 bytes of theCONTEXT
structure, which happens to be itsP1Home
field -
It doesn't break anything, because this field has been already used (when fetching
NormalContext
) and is no longer accessed (not even byZwContinue
)
NOTE: Not all function calls from x86 NTDLL end up in x64 NTDLL. This is because some functions are fully implemented on its own in both x86 and x64 NTDLL. This applies mainly on functions that does not require any syscall - i.e.
Rtl*
functions. For example, if you wanted to hookRtlDecompressBuffer
in WoW64 process, hooking that function in x64 NTDLL wouldn't have any effect and such hooked function would be never called. -
Injection of x86 processes on x86 Windows is handled exactly the same way as injection of WoW64 processes with x86 DLL on x64 Windows (with the exception of
PsWrapApcWow64Thread
call). -
Injection of x64 processes is handled exactly the same way as injection of WoW64 processes with x64 DLL.
-
Finally, as mentioned in the beginning, the injected DLL performs logging of hooked functions with ETW. Because functions such as
EventRegister
,EventWriteString
, ... are located in theadvapi32.dll
, we can't use them from our NTDLL-only dependent DLL. Luckily, ETW support is hardwired in thentdll.dll
too. In fact, most of theEvent*
functions in theadvapi32.dll
are simply redirected to theEtwEvent*
functions inntdll.dll
without any change to the arguments! Therefore, we can simply mock theEvent*
functions and just include the<evntprov.h>
header:// // Include support for ETW logging. // Note that following functions are mocked, because they're // located in advapi32.dll. Fortunatelly, advapi32.dll simply // redirects calls to these functions to the ntdll.dll. // #define EventActivityIdControl EtwEventActivityIdControl #define EventEnabled EtwEventEnabled #define EventProviderEnabled EtwEventProviderEnabled #define EventRegister EtwEventRegister #define EventSetInformation EtwEventSetInformation #define EventUnregister EtwEventUnregister #define EventWrite EtwEventWrite #define EventWriteEndScenario EtwEventWriteEndScenario #define EventWriteEx EtwEventWriteEx #define EventWriteStartScenario EtwEventWriteStartScenario #define EventWriteString EtwEventWriteString #define EventWriteTransfer EtwEventWriteTransfer #include <evntprov.h>
...easy, wasn't it?
Following example is performed on Windows 10 x64
Enable Test-Signing boot configuration option (note that you'll need administrative privileges to use
bcdedit
) and reboot the machine:
bcdedit /set testsigning on
shutdown /r /t 0
Now open administrator command line and run following command:
injldr -i
The -i
option installs the driver. After the driver is installed, it waits for newly created processes.
When a new process is created, it is hooked. Prepare some x86 application, for example, PuTTY and run it.
With Process Explorer we can check that indeed, our x64 DLL is injected in this x86 application.
Also, immediately after injldr
is started, it starts an ETW tracing session and prints out information
about called hooked functions:
You can exit injldr
by pressing Ctrl+C
. Now you can run injldr
without any parameters to just start
the tracing session. If you wish to uninstall the driver, run injldr -u
.
This driver by default inject x64 DLL into WoW64 processes. If you wish to change this behavior and inject x86 DLL instead, set
UseWow64Injection
toTRUE
. Also, do not forget to compileinjdll
for x86 architecture and place it in the same directory asinjldr.exe
.
This software is open-source under the MIT license. See the LICENSE.txt file in this repository.
Dependencies are licensed by their own licenses.
If you find this project interesting, you can buy me a coffee
BTC 12hwTTPYDbkVqsfpGjrsVa7WpShvQn24ro
LTC LLDVqnBEMS8Tv7ZF1otcy56HDhkXVVFJDH