FunctionInliner is an IDA plugin that can be used to ease the reversing of binaries that have been
space-optimized with function outlining (e.g. clang --moutline
).
Our plugin works by creating a clone of the outlined function per each of its xref callers, and linking it to the caller directly by replacing the BL to with a regular branch to the clone and the RET of the clone with a branch back to the caller (thus adding the clone as a function chunk of its caller).
In case the an outlined function has been succesfully inlined into all of its callers, it'll be
renamed to have the inlined_
prefix, to make it easy to identify in functions/xrefs listing.
The plugin supports both manually choosing functions to inline from their context menu, and heuristically identifying all outlined functions and inlining them.
Code with outlined functions is a pain to reverse because the outlined functions usually use registers and memory that is local to their caller (i.e. they don't conform to the ABI). Therefore you don't have the entire context when reversing the caller and have to jump back and forth into those outlined parts to follow what's going on.
Moreover, reversing code with outlined functions using Hex Rays simply doesn't work since Hex Rays assumes that functions conform to the ABI in order to do its magic. Moreover, if you'll try to jump into the outlined function in Hex Rays you'll often see them as empty because of that.
As an example we used gzip 1.3.5 which is a single source file that was easy to work with, and we
looked at the beginning of a single function from it (bi_windup
):
On the left, you see the function compiled with -O3
and in the middle you see it compiled with
-O3 -moutline
. Calls to outlined functions were highlighted (obviously, these wouldn't have stood
out from other calls in case symbols have been stripped).
We've also marked some screwups in Hex Rays' decompilation that were caused by these outlined functions not conforming to the ABI.
On the right, you see the same function after our whole-IDB analysis has been applied. You can see that most outlined functions have been automatically inlined, and all decompilation screwups have been resolved.
To be exact, in this file our whole-IDB analysis found and automayically inlined 130 out of 165 outlined functions, with no false positives. The rest of the outlined functions can be easily inlined from their context menu in case they're manually identified later.
Specifically in this example, you can also see that OUTLINED_FUNCTION_13
(which was not
automatically inlined) is a simple wrapper to write
which specifies nbytes = 0x4000
. In this
case we could never determine whether this was an original wrapper function or an outlined function
that we should inline.
- Install the dependencies listed in
requirements.txt
where IDA can import them. For example using/path/to/python3/used/by/ida -m pip install -r requirements.txt
. - Install keypatch.
- Clone this repository and symlink
~/.idapro/plugins/functioninliner.py
tofunctioninliner.py
in the cloned repo.
From the menu select Edit -> Plugins -> FunctionInliner -> Patch constant register-based calls to regular calls
and then Edit -> Plugin -> FunctionInliner -> Inline all outlined functions
in
order to try and do everything we can to make the IDB more readable.
Note: all of the context menus described below work both in IDA views and in Psuedocode views.
Right-click on a BL
to an outlined function, or on the beginning of an outlined function and
choose Inline function
.
Note that the cloning logic does not support functions which consist of multiple function chunks. For such cases, you should dechunk the function manually, or have it done automatically by running our whole-IDB processing.
Right-click on a B
to the cloned code that was originally outlined, on the begining of the cloned
code, or on the beginning of the original outlined function and choose Undo function inlining
The plugin also supports working on the entire IDB and inlining every function that is identified
as an outlined function. See the Principals of operation
section for the heuristics used to
identify these.
From the menu select Edit -> Plugins -> FunctionInliner -> Inline all outlined functions
in order
to scan all of the functions in the binary and inline those who are identified as outlined.
Note that we first do some preprocessing on the entire IDB in order to fix various situations that may have occured from IDA auto-analyzing the IDB without taking outlined functions into consideration.
In some cases the compiler and linker generate register-based calls for constant addresses (and not regular calls). IDA obviously doesn't generate call xrefs in these cases (but data xrefs) and so our inlining logic cannot patch these calls.
From the menu select Edit -> Plugins -> FunctionInliner -> Patch constant register-based calls to regular calls
in order to scan all of the IDB for these patterns and patch them to regular calls.
Since this behaviour is actively patching the IDB we kept it as a separate (optional) action, and do
not do this as part of the Inline all outlined functions
preprocessing logic.
Our preprocessing is comprised of a number of steps:
- Exploration steps are repeated until there's nothing new to be done:
- We fixes cases where some function lines were not analysed as code, because IDA falsly identified some function a NORET so it didn't analyse the code after BLs to it.
- We create functions at xref targets that IDA didn't make a function out of.
- Preprocessing steps are done afterward the exploration:
- We dechunks all of the functions in the binary (split each chunk into a separate function). This helps us identify later on which chunks were outlined and which are "real" functions. Plus, our cloning logic doesn't support chunked functions.
- We split functions that are placed right before another function they tail-call into, and were treated by IDA as one whole function.
- We split adjacent functions that were treated by IDA as one whole function.
For each xref to the outlined function, we create a new segment named
inlined_0x{func_ea:x}_for_0x{src_ea:x}
and clone the function there.
When cloning, we in fact have to translate some of the opcodes on the way -- if an opcode has relative data or code xrefs we need to fix them to work from the new location. We also may have to fix relative xrefs inside the cloned code because our translation may move stuff around in the clone as well.
We then replace the original BL
to the outlined function with a B
to the cloned code, and
replace the RET
in the end of the outlined function with a B
back to the caller.
There are of course edge cases when the outlined function tail-calls some other function, or when the outlined function is tail-called by its caller, which should be handled.
We also take care to find a spot for the cloned code segment which will be close enough to the caller and to outgoing xrefs from the clone in order to use regular branches back and forth.
Currently we use a few heuristics to identify outlined functions.
There may be false-negatives (i.e. we may miss some outlined functions) but we expect their count to be pretty low and they can always be inlined manually when encountered.
Also, in case there will be any false-positives (i.e. we'll identify some real functions as outlined and inline them into their callers) the effect shouldn't be that bad for RE and can also be undone manually.
The heuristics we use are the following:
We expect outlined functions to have more than one caller (otherwise it wouldn't have been useful to outline them)for some reason this doesn't hold in real cases, so we've dropped this heuristic.- We expect all outlined functions not to have a prologue (not really sure about that, but it makes sense). This is more of an optimization for us, in order not to statically analyze every function in the IDB.
- We expect some outlined functions not to conform to the ABI and to make use of non-argument registers that were not initialized internally.
- We expect some outlined functions not to conform to the ABI and to leave side-effects on non-result registers (that are not propagated to any result register or stored in memory).
- The last two heuristics also hold for condition flags and not registers (i.e. if the function is using uninitializing/setting unused condition flags).
There are some cases of outlined functions that we currently don't auto detect with our heuristics:
- Some outlined functions leave side-effects on higher result registers but do not set all of the lower ones, so it's obvious that they're not just returning a structure by value.
- Consider removing the heuristic about outlined functions not having prologues, since we've seen cases of outlined prologues.
There are some cases which our cloning logic doesn't support:
- Some uses of conditional opcodes (e.g. when the call to the outlined function is a conditional tail-call).
Some other stuff:
- When running our heuristics, we currently stop analyzing a function if we encounter a BL or a tail-call in it, because that may lead to another outlined function. The proper handling should probably be to analyze and inline all functions in order of a topological sort based on call targets (i.e. first analyze and inline functions which don't call anything, then those who call already analyzed functions, and so on).
- When running our heuristics, we usually stop analyzing a function if we encounter more than one basic block. We could theoretically continue analyzing recursively in each of the branches.
The plugin currently works only on ARM64 binaries that conform to the ABI.
Authored by Tomer Harpaz of Cellebrite Security Research Labs. Developed and tested for IDA 7.6 on macOS with Python 3.7.9.