/warp

Common format for transferring and applying function information across binary analysis tools

Primary LanguageRustOtherNOASSERTION

WARP

WARP provides a common format for transferring and applying function information across binary analysis tools.

WARP Integrations

Binary Ninja

WARP integration is available as an open source first-party plugin for Binary Ninja and as such ships by default.

Function Identification

Function identification is the main way to interact with WARP, allowing tooling to utilize WARP's dataset to identify common functions within any binary efficiently and accurately.

Integration Requirements

To integrate with WARP function matching you must be able to:

  1. Disassemble instructions
  2. Identify basic blocks that make up a function
  3. Identify register groups with implicit extend operation
  4. Identify relocatable instructions (see What is considered a relocatable operand?)

Creating a Function GUID

The function GUID is the UUIDv5 of the basic block GUID's (sorted highest to lowest start address) that make up the function.

Example

Given the following sorted basic blocks:

  1. 036cccf0-8239-5b84-a811-60efc2d7eeb0
  2. 3ed5c023-658d-5511-9710-40814f31af50
  3. 8a076c92-0ba0-540d-b724-7fd5838da9df

The function GUID will be 7a55be03-76b7-5cb5-bae9-4edcf47795ac.

Example Code
import uuid

def uuid5(namespace, name_bytes):
  """Generate a UUID from the SHA-1 hash of a namespace UUID and a name bytes."""
  from hashlib import sha1
  hash = sha1(namespace.bytes + name_bytes).digest()
  return uuid.UUID(bytes=hash[:16], version=5)

function_namespace = uuid.UUID('0192a179-61ac-7cef-88ed-012296e9492f')
bb1 = uuid.UUID("036cccf0-8239-5b84-a811-60efc2d7eeb0")
bb2 = uuid.UUID("3ed5c023-658d-5511-9710-40814f31af50")
bb3 = uuid.UUID("8a076c92-0ba0-540d-b724-7fd5838da9df")
function = uuid5(function_namespace, bb1.bytes + bb2.bytes + bb3.bytes)

What is the UUIDv5 namespace?

The namespace for Function GUID's is 0192a179-61ac-7cef-88ed-012296e9492f.

Creating a Basic Block GUID

The basic block GUID is the UUIDv5 of the byte sequence of the instructions (sorted in execution order) with the following properties:

  1. Zero out all instructions containing a relocatable operand.
  2. Exclude all NOP instructions.
  3. Exclude all instructions that set a register to itself if they are effectively NOPs.

When are instructions that set a register to itself removed?

To support hot-patching we must remove them as they can be injected by the compiler at the start of a function (see: 1 and 2). This does not affect the accuracy of the function GUID as they are only removed when the instruction is a NOP:

  • Register groups with no implicit extension will be removed (see: 3 (under 3.4.1.1))

For the x86_64 architecture this means mov edi, edi will not be removed, but it will be removed for the x86 architecture.

What is considered a relocatable operand?

An operand that is used as a pointer to a mapped region.

For the x86 architecture the instruction e8b55b0100 (or call 0x15bba) would be zeroed.

What is the UUIDv5 namespace?

The namespace for Basic Block GUID's is 0192a178-7a5f-7936-8653-3cbaa7d6afe7.

Function Constraints

Function constraints allow us to further disambiguate between functions with the same GUID, when creating the functions we store information about the following:

  • Called functions
  • Caller functions
  • Adjacent functions

Each entry in the lists above is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID.

Why don't we require matching on constraints for trivial functions?

The decision to match on constraints is left to the user. While requiring constraint matching for functions from all datasets can reduce false positives, it may not always be necessary. For example, when transferring functions from one version of a binary to another version of the same binary, not matching on constraints for trivial functions might be acceptable.

Comparison of Function Recognition Tools

WARP vs FLIRT

The main difference between WARP and FLIRT is the approach to identification.

Function Identification

  • WARP the function identification is described here.
  • FLIRT uses incomplete function byte sequence with a mask where there is a single function entry (see: IDA FLIRT Documentation for a full description).

What this means in practice is WARP will have less false positives based solely off the initial function identification. When the returned set of functions is greater than one, we can use the list of Function Constraints to select the best possible match. However, that comes at the cost of requiring a computed GUID to be created whenever the lookup is requested and that the function GUID is always the same.