/stlkrn

C++ STL in the Windows Kernel with C++ Exception Support

Primary LanguageC++MIT LicenseMIT

C++ STL in Windows Drivers

This project uses MSVC C++ STL in a Windows Kernel Driver. In this solution jxystl.lib is implemented as a kernel-tuned, pool type/tag aware, template library and MSVC implementation. Which, under the hood, uses the MSVC C++ STL.

#include <wdm.h>
#include <jxy/string.hpp>

extern "C"
NTSTATUS DriverEntry(
    PDRIVER_OBJECT DriverObject,
    PUNICODE_STRING RegistryPath)
{
    jxy::wstring<PagedPool, '0GAT'> helloWorld;

    try
    {
        helloWorld.assign(L"Hello, World!");
    }
    catch (const std::bad_alloc&)
    {
        return STATUS_INSUFFICIENT_RESOURCES;
    }

    return STATUS_SUCCESS;
}
1: kd> dv
   DriverObject = 0xffffca83`5380d300 Driver "\Driver\stlkrn"
   RegistryPath = 0xffffca83`5227f000 "\REGISTRY\MACHINE\SYSTEM\ControlSet001\Services\stlkrn"
     helloWorld = "Hello, World!"

The driver implemented in this solution, stdkrn.sys, uses various std namespace containers, wrapped under the jxy namespace. This driver registers for process, thread, and image notifications; then uses modern C++ to track process contexts, thread contexts, and module contexts.

Exception Handling - vcrtl

Exception handling enables C++ objects to unwind when an exception is thrown. This is a core feature of C++ which gets little attention for kernel drivers. Microsoft does not natively support C++ exceptions for kernel drivers.

C++ exception handling is made possible by avakar's vcrtl libraray. This project would have been far more work without avakar's awesome contribution. For information on exception handling in Windows Drivers head over to avakar's vcrtl github. Also, this page gives excellent details on exception handling on AMD64.

MSVC C++ STL Support - jxystl

Windows Kernel allocations are associated with a memory pool. Further, pool tagging is built into the Windows Kernel. Pool tagging facilitates tracking of allocations made by drivers. This tagging facility enables debugging and monitoring of allocations.

The jxy namespace, in this solution, empowers development of Windows drivers using the std namespace objects with pool typing and tagging.

The library opts not to implement "global" new/delete operators. It implements only new/delete operators with pool typing and tagging capability. This requires specifying pool types and tags. If some functionality is used that would require a "global allocator" it will not link. This is an intentional design decision such that no global allocators are used, all allocations must specify a pool type and tag.

The jxy namespace implements allocators and deleters which conform to the standard for use in template containers. These allocators and deleters are pool type/tag aware. They require specifying the pool type and tag and prevent conversions/rebinding across tool types and tags - they should be used in place of the STL allocators.

jxy::allocator<T, PagedPool, '0GAT'>;
jxy::default_delete<T, PagedPool, '0GAT'>;

jxystl.lib implements necessary "fill" functionality for use of MSVC STL containers. The implementations (in msvcfill.cpp) are considerate to the kernel. This functionality enables the MSVC STL containers to link to kernel-appropriate functionality. This also means that if some std container functionality is used that doesn't have "fill" functionality behind it - the linker will fail. This is an intentional design decision such that any implementations are thought through for use in the kernel.

CRT initialization and atexit functionality is intentionally not supported. Order of CRT initialization is unclear and non-obvious. When a kernel driver loads global data should be clearly setup and torn down during driver load and unload. Global CRT initialization "hides" this initialization in a non-obvious way. Further, CRT atexit functionality is not supported. Emission of necessary synchronization enabling local static initialization of C++ objects is not done by the compiler. And would introduces non-obvious synchronization in the kernel. Lack of CRT initialization and atexit support is an intentional design decision. I strongly recommend avoiding it when developing kernel drivers.

As an example, the jxy namespace "wraps" std::vector and forces use of pool types and tags:

namespace jxy
{

template <typename T, 
          POOL_TYPE t_PoolType, 
          ULONG t_PoolTag, 
          typename TAllocator = jxy::allocator<T, t_PoolType, t_PoolTag>> 
using vector = std::vector<T, TAllocator>;

}

jxy::vector<int, PagedPool, '0GAT'> integers;
stlkrn!DriverEntry+0xea:
0: kd> dx integers
integers                 : { size=10 } [Type: std::vector<int,jxy::details::allocator<int,1,809976148> >]
    [<Raw View>]     [Type: std::vector<int,jxy::details::allocator<int,1,809976148> >]
    [capacity]       : 10
    [allocator]      : {...} [Type: std::_Compressed_pair<jxy::details::allocator<int,1,809976148>,std::_Vector_val<std::_Simple_types<int> >,1>]
    [0]              : 1 [Type: int]
    [1]              : 2 [Type: int]
    [2]              : 3 [Type: int]
    [3]              : 4 [Type: int]
    [4]              : 5 [Type: int]
    [5]              : 6 [Type: int]
    [6]              : 7 [Type: int]
    [7]              : 8 [Type: int]
    [8]              : 9 [Type: int]
    [9]              : 10 [Type: int]

Below is table of functionality under the jxy namespace:

jxylib STL equivalent Include Notes
jxy::allocator std::allocator <jxy/memory.hpp>
jxy::default_delete std::default_delete <jxy/memory.hpp>
jxy::unique_ptr std::unique_ptr <jxy/memory.hpp>
jxy::shared_ptr std::shared_ptr <jxy/memory.hpp>
jxy::basic_string std::basic_string <jxy/string.hpp>
jxy::string std::string <jxy/string.hpp>
jxy::wstring std::wstring <jxy/string.hpp>
jxy::vector std::vector <jxy/vector.hpp>
jxy::map std::map <jxy/map.hpp>
jxy::multimap std::miltimap <jxy/map.hpp>
jxy::mutex std::mutex <jxy/locks.hpp> Uses KGUARDED_MUTEX
jxy::shared_mutex std::shared_mutex <jxy/locks.hpp> Uses EX_PUSH_LOCK
jxy::unique_lock std::unique_lock <jxy/locks.hpp>
jxy::shared_lock std::shared_lock <jxy/locks.hpp>
jxy::scope_resource None <jxy/scope.hpp> Similar to std::experimental::unique_resource
jxy::scope_exit None <jxy/scope.hpp> Similar to std::experimental::scope_exit
jxy::thread std::thread <jxy/thread.hpp>
jxy::deque std::deque <jxy/deque.hpp>
jxy::queue std:queue <jxy/queue.hpp>
jxy::priority_queue std::priority_queue <jxy/queue.hpp>
jxy::set std::set <jxy/set.hpp>
jxy::multiset std::multiset <jxy/set.hpp>
jxy::stack std::stack <jxy/stack.hpp>

Tests - stltest.sys

The stltest project implements a driver that runs some tests against jxystl, usage of STL, and exceptions in the Windows Kernel.

Practical Usage - stlkrn.sys

The stlkrn project is a Windows Driver that uses jxylib to implement process, thread, and module tracking in the Windows Kernel.

stlkrn.sys registers for process, thread, and image notifications using functionality exported by ntoskrnl. Using these callbacks it tracks processes, threads, and image loads in various objects which use jxy::map, jxy::shared_mutex, jxy::wstring, and more.

The driver has two singletons. jxy::ProcessMap and jxy::ThreadMap, these are constructed when the driver loads (DriverEntry) and torn down when the driver unloads (DriverUnload). It is worth noting here each process tracked in the jxy::ProcessMap (implemented as jxy::ProcessContext) also manages a jxy::ThreadMap. Each "context" (jxy::ProcessContext, jxy::ThreadContext, and jxy::ModuleContext) is a shared (referenced) object (jxy::shared_ptr). Therefore, the thread context that exists in the thread map singleton is the same context associated with the process context.

Key components of stlkrn.sys:

Object Purpose Source Notes
jxy::ProcessContext Information for a process running on the system. process_context.hpp/cpp Uses jxy::wstring. Has thread (jxy::ThreadMap) and module (jxy::ModuleMap) map members.
jxy::ThreadContext Information for a thread running on the system. thread_context.hpp/cpp Uses std::atomic.
jxy::ModuleContext Information for an image loaded in a given process. module_context.hpp/cpp Uses jxy::wstring and jxy::shared_mutex.
jxy::ProcessMap Singleton, maps shared jxy::ProcessContext objects to a PID. process_map.hpp/cpp Singleton is accessed via jxy::GetProcessMap. Uses jxy::shared_mutex and jxy::map.
jxy::ThreadMap Maps shared jxy::ThreadContext objects to a TID. thread_map.hpp/cpp The global thread table (singleton) is accessed via jxy::GetThreadMap. Each jxy::ProcessContext also has a thread map which is accessed through jxy::ProcessContext::GetThreads. Uses jxy::shared_mutex and jxy::map.
jxy::GetModuleMap Maps shared jxy::ModuleContext to a loaded image extents (base and end address). module_map.hpp/cpp Each process context has a module map member. Loaded images for a given process are tracked using this object. Uses jxy::shared_mutex and jxy::map

std::unordered_map would have been a better choice over the ordered tree (std::map) for the object maps. There is a reason this isn't used (see TODO section).

stlkrn!jxy::nt::CreateProcessNotifyRoutine+0xa6:
3: kd> dx proc
proc                 : {...} [Type: std::shared_ptr<jxy::ProcessContext>]
    [<Raw View>]     [Type: std::shared_ptr<jxy::ProcessContext>]
    [ptr]            : 0xffffaa020d73cf70 [Type: jxy::ProcessContext *]
    [control block]  : custom deleter, custom allocator [Type: std::_Ref_count_resource_alloc<jxy::ProcessContext *,jxy::details::default_delete<jxy::ProcessContext,1,1668307018>,jxy::details::allocator<jxy::ProcessContext,1,1668307018> > (derived from std::_Ref_count_base)]
    [+0x000] m_ProcessId      : 0x2760 [Type: unsigned int]
    [+0x004] m_SessionId      : 0x2 [Type: unsigned int]
    [+0x008] m_ParentProcessId : 0xcc4 [Type: unsigned int]
    [+0x010] m_FileName       : "\Device\HarddiskVolume4\Windows\System32\cmd.exe" [Type: std::basic_string<unsigned short,std::char_traits<unsigned short>,jxy::details::allocator<unsigned short,1,1852856394> >]
    [+0x030] m_FilePart       : "cmd.exe" [Type: std::basic_string<unsigned short,std::char_traits<unsigned short>,jxy::details::allocator<unsigned short,1,1886410826> >]
    [+0x050] m_CreatorProcessId : 0x1b08 [Type: unsigned int]
    [+0x054] m_CreatorThreadId : 0x26a0 [Type: unsigned int]
    [+0x058] m_Threads        [Type: jxy::ThreadMap]
    [+0x070] m_Modules        [Type: jxy::ModuleMap]

TODO

Although jxy::shared_ptr is supported through std::shared_ptr directly. This implementation could be improved. Internally, std::shared_ptr will use a global new allocation in some circumstances. To avoid this jxy::make_shared is implemented to associate the appropriate pool tagged/typed allocator and deleter. This introduces an extra control block allocation for the shared reference, which is what std::make_shared aims to avoid. Unfortunately, attaching a control block to the container is not public functionality. This could be improved with some support by MSVC or by hand-rolling a jxy::shared_ptr which is better tuned for kernel-use.

I had wanted to include std::unordered_map initially, however it uses ceilf. Floating point arithmetic in the Windows Kernel comes with some challenges. So, for now it is omitted until an appropriate solution is designed.

Disclaimer

This solution is a passion project. At this time it is not intended for production code. x64 is well tested and stable, stlkrn.sys passes full driver verifier options (including randomized low resource simulation). Exception handling at or above dispatch has been tested, but not in practical use cases. x86 has not been tested. There is functionality under the jxy namespace that is incomplete/unused/untested. Your milage may vary - I would like to continue this work over time, if any issues/bugs are found feel free to open issues against this repo.

Related Work

This project provides STL support in the Windows Kernel by using as much of the STL facility as possible. There are other solutions for use of STL in kernel development. This section will outline alternatives, first I will summarize this work:

This Project:

  • Uses the STL directly. Does not reimplement any STL functionality unless absolutely necessary.
  • Requires pool types and tags. No global new or delete is implemented.
  • Forbid moving data between objects of different pools or tags.
  • Avoids CRT initialization and atexit functionality. CRT initialization order is non-obvious, driver initialization and teardown should be obvious. atexit functionality may introduce data races for kernel code, atexit is not implemented.

Bareflank Hypervisor:

Bareflank implements support for running C++ in their hypervisor. They have full STL and CRT support. This is a comprehensive project that enables a plethora features of the standard in kernel mode (including exceptions). As I understand their solution forces NonPagedPool on global new/delete allocations. I have to commend Bareflank with their implementation, it's well thought out and cross platform. However the Windows implementation builds through cygwin and "shims" in support for the Windows kernel. In comparison, this project aims to be considerate to the Windows kernel. It enables specifying pool tags and types (paged vs non-paged) and hopes to minimize "sharp edges" associated with using C++ and the STL in kernel mode. All that said, Bareflank is impressive for what is does. For an excellent presentation on Bareflank's support of C++ I highly recommend watching Dr. Rian Quinn's presentation at cppcon 2016.

Win32KernelSTL:

The Win32KernelSTL project does allow you to use STL functionality directly in the kernel. The project implements global new/delete and forces NonPagedPool, it implements CRT initialization support, and bugchecks when a cpp exception is thrown. It makes no attempt to do cpp exception unwinding. Due to the assumptions it makes I find it unpractical for any serious use cases. The code is reasonably clear and documented, I recommend giving this project a browse for educating around C++ support in the kernel. One note, the CRT code in Win32KernelSTL does implement atexit but keep in mind there is no synchronization emitted by the compiler here (as opposed to user mode). So a local static requiring insertion of an entry in the atexit list may race causing a double-init or double-free.

Driver Plus Plus:

This project implements necessary C++ facility for pulling in a number of C++ solutions into kernel mode (EASTL, msgpack, etc.). Driver Plus Plus implements CRT initialization and global new/delete support (which forces NonPagedPool). Again this is counter to the goals of this project. However, this project does enable a lot of great C++ facility for use in kernel mode. It does make modifications to the C++ solutions it pulls in to shim in support for it's use cases. Driver Plus Plus also makes the assumption around atexit as mentioned previously.

KTL:

KTL (Windows Kernel Template Library) reimplements a good amount of modern C++ functionality for use in the Windows Kernel. It also implements global new/delete but does a decent job at providing facility for specifying pool tags and types where possible. However this does mean the global allocator might hide an allocation in a non-obvious pool. Further the template allocators in this project carry the cost of two points for an allocator and deallocator object, I am also concerned that conversion between the allocator types may allow for cross pool/tag allocs/frees. Overall I'm impressed by the amount of facility that is implemented here. Reimplementation of STL functionality and the global allocators are counter to the ideologies of this project.

Kernel-Bridge:

Kernel-Bridge implements some great facility for Windows Kernel development. The library provides wrappers for registering for Windows callbacks using C++ objects. I would like to find more time to use and investigate this solution. It does implement CRT support. The atexit functionality implemented is not dynamic - it uses a static array, if it runs out of slots, it fails. The default new/delete forces NonPagedPool. It does not have full exception support, it will bugcheck if a cpp exception is thrown - it will not unwind objects on the stack.

Credits

This repository draws from some preexisting work. Credits to their authors.

  • C++ Exceptions in Windows Drivers
    This project implements parts of the Visual Studio runtime library that are needed for C++ exception handling. Currently, x86 and x64 platforms are supported.
  • Process Hacker Native API Headers
    Collection of Native API header files. Gathered from Microsoft header files and symbol files, as well as a lot of reverse engineering and guessing.