/stlkrn

C++ STL in the Windows Kernel with C++ Exception Support

Primary LanguageC++MIT LicenseMIT

C++ STL in Windows Drivers

This project uses MSVC C++ STL in a Windows Kernel Driver. In this solution jxystl.lib is implemented as a kernel-tuned, pool type/tag aware, template library and MSVC implementation. Which, under the hood, uses the MSVC C++ STL.

#include <wdm.h>
#include <jxy/string.hpp>

extern "C"
NTSTATUS DriverEntry(
    PDRIVER_OBJECT DriverObject,
    PUNICODE_STRING RegistryPath)
{
    jxy::wstring<PagedPool, '0GAT'> helloWorld;

    try
    {
        helloWorld.assign(L"Hello, World!");
    }
    catch (const std::bad_alloc&)
    {
        return STATUS_INSUFFICIENT_RESOURCES;
    }

    return STATUS_SUCCESS;
}
1: kd> dv
   DriverObject = 0xffffca83`5380d300 Driver "\Driver\stlkrn"
   RegistryPath = 0xffffca83`5227f000 "\REGISTRY\MACHINE\SYSTEM\ControlSet001\Services\stlkrn"
     helloWorld = "Hello, World!"

The driver implemented in this solution, stdkrn.sys, uses various std namespace containers, wrapped under the jxy namespace. This driver registers for process, thread, and image notifications; then uses modern C++ to track process contexts, thread contexts, and module contexts.

Exception Handling - vcrtl

Exception handling enables C++ objects to unwind when an exception is thrown. This is a core feature of C++ which gets little attention for kernel drivers. Microsoft does not natively support C++ exceptions for kernel drivers.

C++ exception handling is made possible by avakar's vcrtl libraray. This project would have been far more work without avakar's awesome contribution. For information on exception handling in Windows Drivers head over to avakar's vcrtl github. Also, this page gives excellent details on exception handling on AMD64.

MSVC C++ STL Support - jxystl

Windows Kernel allocations are associated with a memory pool. Further, pool tagging is built into the Windows Kernel. Pool tagging facilitates tracking of allocations made by drivers. This tagging facility enables debugging and monitoring of allocations.

The jxy namespace, in this solution, empowers development of Windows drivers using the std namespace objects with pool typing and tagging.

The library opts not to implement "global" new/delete operators. It implements only new/delete operators with pool typing and tagging capability. This requires specifying pool types and tags. If some functionality is used that would require a "global allocator" it will not link. This is an intentional design decision such that no global allocators are used, all allocations must specify a pool type and tag.

The jxy namespace implements allocators and deleters which conform to the standard for use in template containers. These allocators and deleters are pool type/tag aware. They require specifying the pool type and tag and prevent conversions/rebinding across tool types and tags - they should be used in place of the STL allocators.

jxy::allocator<T, PagedPool, '0GAT'>;
jxy::default_delete<T, PagedPool, '0GAT'>;

jxystl.lib implements necessary "fill" functionality for use of MSVC STL containers. The implementations (in msvcfill.cpp) are considerate to the kernel. This functionality enables the MSVC STL containers to link to kernel-appropriate functionality. This also means that if some std container functionality is used that doesn't have "fill" functionality behind it - the linker will fail. This is an intentional design decision such that any implementations are thought through for use in the kernel.

CRT initialization and atexit functionality is intentionally not supported. Order of CRT initialization is unclear and non-obvious. When a kernel driver loads global data should be clearly setup and torn down during driver load and unload. Global CRT initialization "hides" this initialization in a non-obvious way. Further, CRT atexit functionality is not supported. Emission of necessary synchronization enabling local static initialization of C++ objects is not done by the compiler. And would introduces non-obvious synchronization in the kernel. Lack of CRT initialization and atexit support is an intentional design decision. I strongly recommend avoiding it when developing kernel drivers.

As an example, the jxy namespace "wraps" std::vector and forces use of pool types and tags:

namespace jxy
{

template <typename T, 
          POOL_TYPE t_PoolType, 
          ULONG t_PoolTag, 
          typename TAllocator = jxy::allocator<T, t_PoolType, t_PoolTag>> 
using vector = std::vector<T, TAllocator>;

}

jxy::vector<int, PagedPool, '0GAT'> integers;
stlkrn!DriverEntry+0xea:
0: kd> dx integers
integers                 : { size=10 } [Type: std::vector<int,jxy::details::allocator<int,1,809976148> >]
    [<Raw View>]     [Type: std::vector<int,jxy::details::allocator<int,1,809976148> >]
    [capacity]       : 10
    [allocator]      : {...} [Type: std::_Compressed_pair<jxy::details::allocator<int,1,809976148>,std::_Vector_val<std::_Simple_types<int> >,1>]
    [0]              : 1 [Type: int]
    [1]              : 2 [Type: int]
    [2]              : 3 [Type: int]
    [3]              : 4 [Type: int]
    [4]              : 5 [Type: int]
    [5]              : 6 [Type: int]
    [6]              : 7 [Type: int]
    [7]              : 8 [Type: int]
    [8]              : 9 [Type: int]
    [9]              : 10 [Type: int]

Below is table of functionality under the jxy namespace:

jxylib STL equivalent Notes
jxy::allocator std::allocator
jxy::default_delete std::default_delete
jxy::unique_ptr std::unique_ptr
jxy::shared_ptr std::shared_ptr
jxy::basic_string std::basic_string
jxy::string std::string
jxy::wstring std::wstring
jxy::vector std::vector
jxy::map std::map
jxy::mutex std::mutex Uses KGUARDED_MUTEX
jxy::shared_mutex std::shared_mutex Uses EX_PUSH_LOCK
jxy::unique_lock std::unique_lock
jxy::shared_lock std::shared_lock
jxy::scope_resource None Similar to std::experimental::unique_resource
jxy::scope_exit None Similar to std::experimental::scope_exit

Practical Usage - stlkrn.sys

The stlkrn project is a Windows Driver that uses jxylib to implement process, thread, and module tracking in the Windows Kernel.

stlkrn.sys registers for process, thread, and image notifications using functionality exported by ntoskrnl. Using these callbacks it tracks processes, threads, and image loads in various objects which use jxy::map, jxy::shared_mutex, jxy::wstring, and more.

The driver has two singletons. jxy::ProcessMap and jxy::ThreadMap, these are constructed when the driver loads (DriverEntry) and torn down when the driver unloads (DriverUnload). It is worth noting here each process tracked in the jxy::ProcessMap (implemented as jxy::ProcessContext) also manages a jxy::ThreadMap. Each "context" (jxy::ProcessContext, jxy::ThreadContext, and jxy::ModuleContext) is a shared (referenced) object (jxy::shared_ptr). Therefore, the thread context that exists in the thread map singleton is the same context associated with the process context.

Key components of stlkrn.sys:

Object Purpose Source Notes
jxy::ProcessContext Information for a process running on the system. process_context.hpp/cpp Uses jxy::wstring. Has thread (jxy::ThreadMap) and module (jxy::ModuleMap) map members.
jxy::ThreadContext Information for a thread running on the system. thread_context.hpp/cpp Uses std::atomic.
jxy::ModuleContext Information for an image loaded in a given process. module_context.hpp/cpp Uses jxy::wstring and jxy::shared_mutex.
jxy::ProcessMap Singleton, maps shared jxy::ProcessContext objects to a PID. process_map.hpp/cpp Singleton is accessed via jxy::GetProcessMap. Uses jxy::shared_mutex and jxy::map.
jxy::ThreadMap Maps shared jxy::ThreadContext objects to a TID. thread_map.hpp/cpp The global thread table (singleton) is accessed via jxy::GetThreadMap. Each jxy::ProcessContext also has a thread map which is accessed through jxy::ProcessContext::GetThreads. Uses jxy::shared_mutex and jxy::map.
jxy::GetModuleMap Maps shared jxy::ModuleContext to a loaded image extents (base and end address). module_map.hpp/cpp Each process context has a module map member. Loaded images for a given process are tracked using this object. Uses jxy::shared_mutex and jxy::map

std::unordered_map would have been a better choice over the ordered tree (std::map) for the object maps. There is a reason this isn't used (see TODO section).

stlkrn!jxy::nt::CreateProcessNotifyRoutine+0xa6:
3: kd> dx proc
proc                 : {...} [Type: std::shared_ptr<jxy::ProcessContext>]
    [<Raw View>]     [Type: std::shared_ptr<jxy::ProcessContext>]
    [ptr]            : 0xffffaa020d73cf70 [Type: jxy::ProcessContext *]
    [control block]  : custom deleter, custom allocator [Type: std::_Ref_count_resource_alloc<jxy::ProcessContext *,jxy::details::default_delete<jxy::ProcessContext,1,1668307018>,jxy::details::allocator<jxy::ProcessContext,1,1668307018> > (derived from std::_Ref_count_base)]
    [+0x000] m_ProcessId      : 0x2760 [Type: unsigned int]
    [+0x004] m_SessionId      : 0x2 [Type: unsigned int]
    [+0x008] m_ParentProcessId : 0xcc4 [Type: unsigned int]
    [+0x010] m_FileName       : "\Device\HarddiskVolume4\Windows\System32\cmd.exe" [Type: std::basic_string<unsigned short,std::char_traits<unsigned short>,jxy::details::allocator<unsigned short,1,1852856394> >]
    [+0x030] m_FilePart       : "cmd.exe" [Type: std::basic_string<unsigned short,std::char_traits<unsigned short>,jxy::details::allocator<unsigned short,1,1886410826> >]
    [+0x050] m_CreatorProcessId : 0x1b08 [Type: unsigned int]
    [+0x054] m_CreatorThreadId : 0x26a0 [Type: unsigned int]
    [+0x058] m_Threads        [Type: jxy::ThreadMap]
    [+0x070] m_Modules        [Type: jxy::ModuleMap]

TODO

Although jxy::shared_ptr is supported through std::shared_ptr directly. This implementation could be improved. Internally, std::shared_ptr will use a global new allocation in some circumstances. To avoid this jxy::make_shared is implemented to associate the appropriate pool tagged/typed allocator and deleter. This introduces an extra control block allocation for the shared reference, which is what std::make_shared aims to avoid. Unfortunately, attaching a control block to the container is not public functionality. This could be improved with some support by MSVC or by hand-rolling a jxy::shared_ptr which is better tuned for kernel-use.

I had wanted to include std::unordered_map initially, however it uses ceilf. Floating point arithmetic in the Windows Kernel comes with some challenges. So, for now it is omitted until an appropriate solution is designed.

Disclaimer

This solution is a passion project. At this time it is not intended for production code. x64 is well tested and stable, stlkrn.sys passes full driver verifier options (including randomized low resource simulation). Exception handling at or above dispatch has been tested, but not in practical use cases. x86 has not been tested. There is functionality under the jxy namespace that is incomplete/unused/untested. Your milage may vary - I would like to continue this work over time, if any issues/bugs are found feel free to open issues against this repo.

Credits

This repository draws from some preexisting work. Credits to their authors.

  • C++ Exceptions in Windows Drivers
    This project implements parts of the Visual Studio runtime library that are needed for C++ exception handling. Currently, x86 and x64 platforms are supported.
  • Process Hacker Native API Headers
    Collection of Native API header files. Gathered from Microsoft header files and symbol files, as well as a lot of reverse engineering and guessing.