/Sibyl

A Miasm2 based function divination.

Primary LanguagePythonOtherNOASSERTION

This file is part of Sibyl.

Copyright 2014 - 2019 Camille MOUGEY

Sibyl

A Miasm2 based function divination.

Idea

In reverse engineer work, stripped binaries are common (malwares, firmwares, ...). Often, they carry usual libraries, such as libc or openssl. Identifying such libraries and their functions can be an interesting starting point. But it is a time consuming task. Moreover, this task is made more difficult due to optimizations, architectures and compilers diversity, custom implementations, obfuscation, ...

Tools have been developed to automate this task. Some are based on CFG (Control Flow Graph) signature (Bindiff), others on magic constants (FindCrypt) or enhanced pattern matching (FLIRT).

Sibyl is one of these tools, dynamic analysis oriented and based on Miasm2 (https://github.com/cea-sec/miasm). The idea is to identify functions from their side effects. That way, identification is independent of the used implementation.

Identifications are done thanks to these steps:

  1. Initialize a minimalist VM for the targeted architecture, with only needed elements
  2. Prepare the function call using the correct ABI and API
  3. Run the target function code inside the VM
  4. If the function crashes (null derefencement, not enough stack arguments, ...), switch to the next test case 4b. If the function ends correctly, compare the final VM state with the expected one. If they match, consider the test case as a candidate

For instance, if one want to identify a strlen, the test will be as follow:

  1. Allocate a string containing "Hello %sworld!" in a read-only memory page
  2. Call the function with a pointer on the string as first argument
  3. Compare the result with 14
  4. Execute the same test with a different string to avoid false positives (detecting a function which always returns 14)

Sibyl test cases are written architecture and ABI independant.

Basically, Sibyl suffers from false positives (identifying a non strlen as a strlen one) and false negatives (misidentifying or skipping a real strlen). Given the hypothesis that the ABI is exactly the one used by the function, Sibyl becomes complete (no more false negatives).

As a sideline, Sibyl can be used to bruteforce a program ABI.

Long story short, this is an enhanced API bruteforcing tool.

Basic usage

Sibyl comes with a CLI, named sibyl, an IDA (https://www.hex-rays.com) stub and a GHIDRA (https://ghidra-sre.org/) stub.

CLI

The sibyl tool is a wrapper on several sub-actions.

$ sibyl
Usage: /usr/local/bin/sibyl [action]

Actions:
	config   Configuration management
	find     Function guesser
	func     Function discovering
	learn    Learn a new function

The main usage of Sibyl, function recognition, is done through the find action. This action comes with several options, to specify ABI, architecture, test cases, ...

To launch function recognition on the ARMv6 binary busybox-amv6l(busybox 1.21.1 http://www.busybox.net/downloads/binaries/1.21.1/), targetting address 0x8230 and 0x8550 and using included test cases:

$ sibyl find binaries/busybox-armv6l 0x00008550 0x00008230
0x00008230 : strlen
0x00008550 : memmove

IDA stub

The IDA stub is located in ext/ida/find.py. If sibyl is installed on the system, no other action is needed to have it running (see section Installation for more details)

Once the script has been loaded by IDA, the user is asked to launch Sibyl either on the current function, or on all function detected by IDA.

The architecture and ABI are provided by IDA. Optionnaly, the set of test to use can be modified.

On busybox-i486: IDA stub

And the associated result:

Python>
Launch identification on 3085 function(s)
Found memcpy at 0x8057120
Found memmove at 0x805714c
Found memset at 0x8057174
Found strcat at 0x80571a8
Found strchr at 0x80571cc
Found strcmp at 0x8057208
Found strcpy at 0x8057228
Found strlen at 0x8057244
Found strncmp at 0x8057258
Found strncpy at 0x8057280
Found strnlen at 0x80572a8
Found strrchr at 0x80572c0
Found memcmp at 0x80572ff
Found strsep at 0x80576ac
Found strspn at 0x8057704
Found stricmp at 0x805799c
Found strpbrk at 0x8057ab8
Found strtok at 0x8057b30
Found strcmp at 0x8057b48
Found atoi at 0x805df1c
Current: 64.83% (sub_0x80b4ab3)| Estimated time remaining: 14.45s
Found atoi at 0x80f1cf3
Current: 100.00% (sub_0x80f7a93)| Estimated time remaining: 0.00s
Finished ! Found 21 candidates in 42.70s
Results are also available in 'sibyl_res'

The corresponding function get an additionnal comment like [Sibyl] memmove?

Additionnaly, a method launch_on_funcs is provided for scripting purposes, and the result of the last run, in addition to the human output on console, is available in sibyl_res variable.

Binary Ninja stub

An external stub for Binary Ninja is available here, maintained by @kenoph.

GHIDRA stub

The GHIDRA stub is located in ext/ghidra/find.py. If sibyl is installed on the system, no other action is needed to have it running (see section Installation for more details). One just need to copy or link the script to path known by GHIDRA (as ~/ghidra_scripts).

Then, it can be called from GHIDRA inteface, through the "Script Manager", category "FunctionID". GHIDRA stub Commented function

Documentation

A more detailed documentation is available in doc:

Current version is v0.2. See changelog for more details.

Installation

Standard

Sibyl requires at least Miasm2 version v0.1.1 and the corresponding version of Elfesteem. For the qemu engine, the unicorn python package must be installed (refer to the documentation of Unicorn for more detail).

Sibyl comes as a Python module, and the installation follow the standard procedure:

$ python setup.py build
# Add the resulting build directory in your PYTHONPATH, or:
$ python setup.py install

In addition of the sibyl Python module, a CLI tool is provided, named sibyl. See the usage documentation for more information.

If needed, consult testing documentation to check your Sibyl installation.

IDA & GHIDRA

IDA & GHIDRA stub are respectively located in ext/ida and ext/ghidra. To benefit from multiprocessing, Sibyl is invoke through the CLI as a subprocess. Then, there is no need to have the sibyl module in IDA Python nor GHIDRA namespace.

Long story short, it should work out of the box once sibyl CLI is available.

Docker

Sibyl is also available through Docker automated build. Use:

$ docker run -i -t commial/sibyl
Usage: /usr/local/bin/sibyl [action]

Actions:
	config   Configuration management
	find     Function guesser
	func     Function discovering
	learn    Learn a new function

Support

Test cases

Sibyl comes with several test cases, located in sibyl/test. These tests are based on function from string.h, stdlib.h and ctype.h.

One can add its custom test cases, and reference it through the configuration file. Have a look at Configuration and Adding a new signature for more information.

Architectures by engine

Sibyl comes with the support of multiple architecture, and multiple engine.

arch/jit python tcc gcc llvm qemu
arml ✔️ ✔️ ✔️ ✔️ ✔️
armb ✔️ ✔️ ✔️ ✔️ ✔️
armtl ⚠️ use arml with +1 offset
armtb ⚠️ use armb with +1 offset
sh4
x86_16 ✔️ ✔️ ✔️ ✔️ ✔️
x86_32 ✔️ ✔️ ✔️ ✔️ ✔️
x86_64 ✔️ ⚠️ bad SSE support ⚠️ bad SSE support ✔️ ✔️
msp430 ✔️ ✔️ ✔️ ✔️ ✔️
mips32b ✔️ ✔️ ✔️ ✔️ ✔️
mips32l ✔️ ✔️ ✔️ ✔️ ✔️
aarch64l ✔️ ✔️ ✔️ ✔️ ✔️
aarch64b ✔️ ✔️ ✔️ ✔️ ✔️

FAQ

Do not hesitate to consult and open an issue if precisions are still needed.

How infinite loops are managed?

Behaviors close to infinite loop happen quite often, especially when the arguments are not formatted as expected by the function (trying another test case). To avoid these behaviors, there is a timeout on each sub-test. The -i/--timeout argument adjusts this parameter (2 by default, 0 to disable timeout).

How to run the tool on a custom architecture?

Once the architecture and corresponding semantic is implemented in Miasm2, one just needs to implement the wanted ABI in sibyl/abi/. If writing the jitter engine part is an issue, one can directly use the python jitter option with -j/--jitter argument. If the semantic is not complete enough, one can add the corresponding bridge with qemu in sibyl/engine/qemu.py, if available.

Is my sibyl func freezed?

Sibyl may take time due to the number of function to consider and the test set size (Sibyl time complexity is approximately in O(number function * test set size)).

In addition, library are often present in the same binary zone, giving the impression that Sibyl got result by burst.

A convenient way to observe its progress is the use of the -v option.

How many coffees could I take while Sibyl is running?

binary architecture test set size addresses to check number of function found elapsed time
busybox-i486 x86_32 26 3085 21 36.0s
busybox-armv6l arml 26 3063 48 1m16.5s
busybox-mipsel mips32l 26 3065 16 44.0s

These tests have been done on a standard, 4 i7 CPU laptop, using the default configuration (ie. qemu jitter) and addresses provided by IDA.

Please note that, by design, Sibyl is embarrassingly parallel.