/anvill

ANVILL forges beautiful LLVM bitcode out of raw machine code

Primary LanguageC++GNU Affero General Public License v3.0AGPL-3.0

Anvill

Anvill implements simple machine code lifting primitives using Remill. The goal of these components is to produce high quality bitcode, which can then be further decompiled to C (via Clang ASTs) using Rellic.

We define "high quality bitcode" as being similar in form to what the Clang compiler would produce if it were executed on a semantically equivalent C function.

Getting Help

If you are experiencing undocumented problems with Anvill then ask for help in the #binary-lifting channel of the Empire Hacking Slack.

Supported Platforms

Anvill is supported on Linux platforms and has been tested on Ubuntu 18.04 and 20.04.

Dependencies

Most of Anvill's dependencies can be provided by the cxx-common repository. Trail of Bits hosts downloadable, pre-built versions of cxx-common, which makes it substantially easier to get up and running with Anvill. Nonetheless, the following table represents most of Anvill's dependencies.

Name Version
Git Latest
CMake 3.2+
Clang 8.0+
Remill Latest
Python 3.8
IDA Pro 7.5+
Binary Ninja Latest

Getting and Building the Code

On Linux

First, update aptitude and get install the baseline dependencies.

sudo apt-get update
sudo apt-get upgrade

sudo apt-get install \
     git \
     python3.8 \
     python3-pip \
     wget \
     curl \
     build-essential \
     libtinfo-dev \
     lsb-release \
     zlib1g-dev \
     ccache \
     cmake

# Ubuntu 14.04, 16.04
sudo apt-get install realpath

Assuming we have Remill properly installed the following steps provide a fresh build of Anvill.

# clone anvill repository
git clone https://github.com/lifting-bits/anvill.git
# create a build dir
mkdir anvill-build && cd anvill-build
# configure
CC=clang cmake ../anvill 
# build
make -j 5
# install
sudo make install

Anvill's python plugins provide functionality needed to generate a JSON specification that contains information about the contents of a binary. These depend on tools like IDA Pro or Binary Ninja for various analysis tasks.

Given that we have either of the above, we can try out Anvill's machine code lifter on a binary of our choice.

# First, we generate a JSON specification from a binary
python3.8 -m anvill --bin_in my_binary --spec_out spec.json
# Finally we produce LLVM bitcode from a JSON specification
./remill-build/tools/anvill/anvill-lift-json-*.0 --spec spec.json --bc_out out.bc

Docker image

To build via Docker run, specify the architecture, base Ubuntu image and LLVM version. For example, to build Anvill linking against LLVM 9 on Ubuntu 20.04 on AMD64 do:

ARCH=amd64; UBUNTU_VERSION=20.04; LLVM=1100; \
   docker build . \
   -t anvill-llvm${LLVM}-ubuntu${UBUNTU_VERSION}-${ARCH} \
   -f Dockerfile \
   --build-arg UBUNTU_VERSION=${UBUNTU_VERSION} \
   --build-arg ARCH=${ARCH} \
   --build-arg LLVM_VERSION=${LLVM}

anvill-specify-bitcode

anvill-specify-bitcode is a tool that produces specifications for all functions contained in an LLVM bitcode module. The purpose of this tool is to enable the creation of a database of specifications for commonly used, often externally- defined functions in binaries (e.g. libc, libc++, libstdc++) in binaries lifted by McSema.

This tool also exists for enabling function declarations for binary code to be written in C or C++, and then translated down into the specification form within a decompiler toolchain.

Finally, this tool exists to enable round-trip testing of LLVM's ISEL lowering and code generation for arbitrary functions.