RawPDB is a C++17 library that directly reads Microsoft Program Debug Database PDB files. The code is extracted almost directly from the upcoming 2.0 release of Live++.
RawPDB gives you direct access to the stream data contained in a PDB file. It does not attempt to offer abstractions for iterating symbols, translation units, contributions, etc.
Building a high-level abstraction over the provided low-level data is an ill-fated attempt that can never really be performant for everybody, because different tools like debuggers, hot-reload tools (e.g. Live++), profilers (e.g. Superluminal), need to perform different queries against the stored data.
We therefore believe the best solution is to offer direct access to the underlying data, with applications bringing that data into their own structures.
Eventually, we want RawPDB to become the de-facto replacement of Microsoft's DIA SDK that most C++ developers (have to) use.
- Fast - RawPDB works directly with memory-mapped data, so only the data from the streams you touch affect performance. It is orders of magnitudes faster than the DIA SDK, and faster than comparable LLVM code
- Scalable - RawPDB's API gives you access to individual streams that can all be read concurrently in a trivial fashion, since all returned data structures are immutable. There are no locks or waits anywhere inside the library
- Lightweight - RawPDB is small and compiles in roughly 1 second
- Allocation-friendly - RawPDB performs only a few allocations, and those can be overridden easily by changing the underlying macro
- No STL - RawPDB does not need any STL containers or algorithms
- No exceptions - RawPDB does not use exceptions
- No RTTI - RawPDB does not need RTTI or use class hierarchies
- High quality code - RawPDB compiles clean under -Wall
Running the Symbols example on a 2GiB PDB yields the following output:
Opening PDB file C:\dev\2GiB.pdb Reading image section stream...done Reading module info stream...done Reading section contribution stream...done Storing 3780654 section contributions...done Reading symbol record stream...done Reading public symbol stream...done Parsing 1804791 public symbols...done Reading global symbol stream...done Parsing 3948114 global symbols...done Reading and parsing 4848 module streams...done Stored 3798885 symbols in std::vector using std::string Running time: 1679.323ms
This is at least an order of magnitude faster than DIA, even though the example code is completely serial and uses std::vector and std::string, which are used for illustration purposes only.
When reading streams in a concurrent fashion, you will most likely be limited by the speed at which the OS can bring the data into your process.
RawPDB gives you access to the following PDB stream data:
-
DBI stream data
- Public symbols
- Global symbols
- Modules
- Module symbols
- Image sections
- Info stream
- Section contributions
- Source files
-
IPI stream data
At the moment, there is no support for C13 line information and the TPI type stream data, because Live++ does not make use of that information yet. However, we will gladly accept PRs, or implement support in the future.
If you are unfamiliar with the basic structure of a PDB file, the LLVM documentation serves as a good introduction.
Consult the example code to see how to read and parse the PDB streams.
- bin: contains final binary output files (.exe and .pdb)
- build: contains Visual Studio 2019 solution and project files
- lib: contains the RawPDB library output files (.lib and .pdb)
- src: contains the RawPDB source code, as well as example code
- temp: contains intermediate build artefacts
A simple example that shows how to load contributions and symbols from public, global, and module streams.
Please feel free to use RawPDB in a commercial application. If you would like to support its development, consider licensing Live++ instead. Not only do you give something back, but get a great productivity enhancement on top!