/wilhelm

Alternative API for IDA / Hex-Rays

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

wilhelm: Alternative API for IDA and Hex-Rays

wilhelm is an API for working with IDA, and in particular the Hex-Rays decompiler. It aims to wrap around the existing SDK's API, plus provide additional features and concepts that make reverse engineering easier.

While wilhelm works well in scripts, it is also designed with the REPL in mind: it is tailored to be easy to use interactively, to help answer simple questions while reversing is taking place. For example, you can use it to search a group of functions for a specific code pattern. As such, wilhelm contains an event system, allowing it to react and update itself whenever the underlying IDB is modified.

Currently, the main feature provided by wilhelm is convenient access to a decompiled function's AST. The next major feature to be added is type management.

Example Usage

Initialize:

>>> import wilhelm as W
>>> W.initialize(Feature.PATH, Feature.MODULE)

Access the AST of some function in the current module:

>>> func = W.current().values["sub_12345"]().func
>>> func.body[0]
<wilhelm.ast.IfStmt at 0xXXXXXXXXXXXX>

>>> func.body[0].expr.op
<OP.UGT: 32>

Find all call expressions in the function:

>>> list(func.select("*/CallExpr"))
[<wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>,
 <wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>,
 <wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>,
 <wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>,
 <wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>]

Get the names of the callee of the call expressions:

>>> [W.current().get_qname_for_addr(e.addr) for e in func.select("*/CallExpr.e_func/")]
[QName<sub_1412C0>, 
 QName<sub_153DD0>, 
 QName<sub_15DA70>, 
 QName<sub_165120>, 
 QName<sub_1664F0>]

Get all calls expressions that are calling function at address 0x43213:

>>> calls = func.select("*/CallExpr[.e_func/GlobalVarExpr{addr = 0x43213}]")
>>> list(calls)
[<wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>,
 <wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>,
 <wilhelm.ast.CallExpr at 0xXXXXXXXXXXXX>]

Get string value of 2nd argument to the above calls:

>>> [e.params[1].value for e in calls if isinstance(e.params[1], W.ast.StrExpr)]
[b'command', b'description']

Dependencies

Requires IDAPython 3, no support for Python 2.

wilhelm requires a working async event loop in IDAPython. The easiest way to get this is by installing qasync, which provides a Qt-based event loop. This loop must be initialized prior to loading wilhelm.

The optional path feature requires pyparsing.

Not a dependency, but using ipyia makes using wilhelm a lot easier.

Installation

wilhelm has yet to be properly packaged. For now, you can use it by cloning the repository and adding the python/ subdirectory to your sys.path somehow.

Note that you need an async event loop setup before you load wilhelm. If you're using qasync, you can add something like this to your idapythonrc:

# Sync asyncio and Qt event loop
from PyQt5.QtWidgets import QApplication
import qasync
import asyncio
qapp = QApplication.instance()
loop = qasync.QEventLoop(qapp, already_running=True)
asyncio.set_event_loop(loop)

Configuration

import wilhelm as W
W.initialize()                             # Init with only core features
# or:
W.initialize(W.Feature.PATH, W.Feature.MODULE) # Init with optional features

Features

Abstract Syntax Tree Access

wilhelm provides a more object-oriented/Pythonic way of accessing a decompiled function's AST. Nodes in the AST have a different class based on the kind of nodes they are, and expose relevant values as fields. A Visitor class can be used to traverse the AST.

A NodeList represents a collection of AST nodes, and provides ways of mapping and filtering the list. This can be used to quickly locate a specific node of interest.

>>> list(nodelist.filter_class(W.ast.BinOpExpr).filter_test(lambda n: n.op == W.ast.OP.EQ))
[<wilhelm.ast.BinOpExpr at 0xXXXXXXXXXXXX>]

AST Wilpaths

The optional path feature provides wilpaths, which are a way to easily navigate and select nodes in an AST. Inspired by the XPath query language for XML, a wilpath builds upon the filtering and mapping features of NodeLists.

Some examples of wilpaths:

  • IfStmt Returns all if statements found in the current node list.

  • /IfStmt Returns all children that are if statements.

  • */IfStmt Returns any descendent that is an if statement.

  • */IfStmt.expr/ Returns the condition expression of all if statement descendents.

  • */IfStmt.expr/*/GlobalVarExpr Returns all global variable expressions that are found within an if statement.

  • */IfStmt.expr/*/GlobalVarExpr{addr = 0x1234} The above, but only those global variable expressions which have an address of 0x1234.

  • */IfStmt[.expr/*/GlobalVarExpr{addr = 0x1234}] The above, but instead of returning the global variable expressions, return the parent if statement.

Please see the docstring in path.py for a complete description of the wilpath DSL.

Event System

wilhelm uses an event system that allows users to register and observe various kinds of events happening within IDA. For example, a callback can be added to trigger whenever some property of a function changes.

Events can propagate, such that one could observe all events happening the children of a parent object, and vice versa.

Currently, the event system is only integrated with the naming system (QNames), but eventually will be available in other features as well, particularly the type system.

Module Representation

The module feature provides a way of accessing the currently-loaded IDA database (aka module). All objects in the database have an associated qualified name (QName), which is kept in sync with the name used by IDA. QNames allow navigation and searching based on their structure: e.g. you can query for all names that are in a particular namespace like foo::SomeClass. Renaming a namespace also automatically updates the names within that namespace.

Querying a name returns a representation of the object. For functions, the AST of the function can be easily accessed via this representation:

wilhelm.current().values["sub_12345"]().func.body[0].expr.e_lhs

The module feature is currently in development, and hence optional, but it will eventually form a core part of wilhelm.

Known Bugs

  • The AST representation does not update when the function gets updated. This will be fixed soon; the AST code was written before the event code, so it needs to be updated to react to events like variable renaming.

Credits

TODO

License

GNU General Public License v3.0

See LICENSE for full text.