This is a demangler for “mangled” source code names. A “mangled” name is an encoded form of a function or other symbol that includes additional information (types, template arguments, etc.) and possibly also compression, such that the result is smaller and is entirely comprised of characters that are valiod for a simple identifier (alphanumeric characters and underscores).
C++ is the pre-emininent source of mangled names, although other languages perform mangling as well.
Normal usage of this library will return the original demangled form if the demangling fails (usually because of incompleteness in this parser). There are occasional scenarios where the conversion to the string format is incomplete: these will generate obviously bad output… let us know what the right output is and we’ll fix it!
This library is intended to be used when processing compiled code that contains
mangled names (e.g. LLVM bitcode, ELF symbol tables, etc.). When processing
compiled code, it is not uncommon to encounter upwards of 10-15 thousand mangled
symbols, all of which may need to be demangled. To support the primary goals of
performance and tight memory footprint, this library uses a Context
that is
essentially a state threaded through the demangling process to normalize
(i.e. use sharing) textual information. There are two primary entry points: one
that demangles a single string and internally creates and discards a Context
for that single demangling, and one that allows the re-use of a Context
, which
is updated as each name is demangled.
The demangling process is designed to never fail or throw exceptions: if a name
cannot be demangled it is assumed to be an unmangled name (or garbage) and the
original, unmangled form is returned. It is also possible to build this library
with the debug flag enabled ($ cabal configure -fdebug
). In the debug mode,
the library will panic on unrecognized parse scenarios or unimplemented string
output forms. The panic provides additional information to help fix this issue,
but is obviously a sub-optimal result for production code, so please only use the
debug mode in appropriate scenarios.
C++ mangling is a twisted and ill-documented process, primarily documented at https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling). There are lots of special considerations (e.g. expressions can be template arguments that must be mangled, thread-local data and guard variables are encoded), as well as relations to C++ syntax (e.g. nested template args require a space between closing ‘>’ characters). This library is NOT YET a complete implementation: there are mangled forms that are not correctly parsed and/or printed; expect updates as these are encountered and needed (and patches are welcome).
This is not the only Haskell demangler library; there is at least one other one that is built on the Boomerang parsing library, which has the advantage in that it is bi-directional, but the disadvantage that it is slow and uses considerably more memory. These disadvantages become very significant when processing source code that contains 10K-15K+ mangled names, ergo this library is not bi-directional but it is oriented towards performance and memory utilization.
This is available as a library that can be used by other applications to demangle
code, and also provides a demangle
application that is the equivalent of the
c++-filt
application to demangle names provided on the command line or as
standard input.
$ cabal run demangle _Znwm
The output of the demangling function(s) is an AST that represents the name in a
semantically rich (but messy) manner. The “sayable” library
(https://hackage.haskell.org/package/sayable) is used convert this AST to an
actual rich string representation: there is a Sayable
instance for each
(AST_object, Context)
tuple. The instances defined support the "diagnostic"
saytag for obtaining some additional information during output; normally the
"normal"
saytag should be used.
putStrLn $ sez_ @"normal" $ demangle1 "_Znwm"
Please use the issue tracker to notify us of any issues. There are certain to be some.