/code-morphing

Code Morphing pass for LLVM

Primary LanguageC++

CMP – Code Morphing Pass

This pass does static code morphing on the IR of the llvm compiler. It has been developed as a project for an exam on cryptography.

It’s licensed under the GPL v3 as stated in every source file in the project. For any further information on the license please refer to http://www.gnu.org/licenses/gpl.html

Project organization

The whole project is split in three part:

  • An LLVM pass that does the instruction morphing.
  • A code generator that starting from a language that we have defined builds the alternatives table used inside the pass.
  • A testing project that I used to test if the transformation pass works on a real algorithm (AES from polar)

All the three project are hosted on github you can find them at:

LLVM Pass
well, I suppose you already know where it is ;)
Code Generator
Code Morphing Generator
Testing Project
Code Morphing Test

In the following section you should find all the details necessary to build everything and be up and running. If anything… just ask.

Quick Start

It is strongly encouraged to keep source tree separated from build tree, so we assume the following directory structure:

  • Let `LLVM_WORK` be the working directory
  • Let `LLVM_SRC` the LLVM sources directory
  • Let `LLVM_BUILD` the LLVM objects directory
  • Let `LLVM_ROOT` the LLVM root installation directory

You can use whichever working directory you want. The other directory are assumed to be children of `LLVM_WORK`. From now on:

  • Let `LLVM_SRC` be `$LLMV_WORK/src`
  • Let `LLVM_BUILD` be `$LLVM_WORK/build`
  • Let `LLVM_ROOT` be `$LLVM_ROOT/root`

First we create the working directory:

$ mkdir $LLVM_WORK

We need LLVM and CLANG. We can get them from Git repositories:

$ cd $LLVM_WORK
$ git clone http://llvm.org/git/llvm.git $LLVM_SRC
$ cd $LLVM_SRC
$ git clone http://llvm.org/git/clang.git tools/clang
$ cd tools/clang

We will build in `LLVM_BUILD`:

$ mkdir $LLVM_BUILD
$ cd $LLVM_BUILD
$ $LLVM_SRC/configure --prefix=$LLVM_ROOT --enable-optimized
$ make -j $N
$ make install

In order to start working with CMP you have to clone its master repository on GitHub into LLVM `projects` directory:

$ cd $LLVM_SRC/projects
$ git clone https://github.com/mminutoli/code-morphing.git

You need to build the `configure` script:

$ cd code-morphing/autoconf
$ ./AutoRegen.sh

Then you have to generate the alternatives builders. To do that check out Code Morphing Generator project and build it. After that write your instruction alternatives source file.

$ cd $LLVM_SRC/projects
$ cd code-morphing/includes/generated
$ ./<path-to-generator> <alternative-source.ia>

Finally you have to build it:

$ cd $LLVM_BUILD/projects
$ mkdir code-morphing
$ cd code-morphing
$ $LLVM_SRC/projects/code-morphing/configure --prefix=$LLVM_ROOT
$ make -j $N

After building, tests are run using:

$ make check

Transformation details

The problem

Embedded processor are subject to cross-side channel attacks due to the simplicity of their Instruction Set (IS) and the lack of hardware counter measure to this issue.

A possible solution would be to rewrite at Run Time (RT) some instruction with pattern of instruction semantically equivalent. Even though this approach works well in other environments, it is not applicable in situation were the code is executed from a static ram.

The two main problems that discourage this approach are:

  • long latency to switch the SRAM in a writable state
  • SRAM can be written a limited amount of times.

Proposed solution

To overcame the limitation explained above this pass works on the IR of LLVM to generate a final binary that has a randomized Control Flow (CF) without the need of memory rewriting at RT.

The steps taken by the transformation are:

  • Add a the alternatives vector inside the module
  • Declare a randomize function
  • For every function that need it add a choice vector and an initialization loop
  • For every basic block that need it add the alternative flows

Lets explain the components of the final code in more details:

Alternative vector
This vector contains, for every instruction that has alternatives declared, the number of available alternatives.
Randomize function
This is a function that generate an integer random number between 0 and its argument included.
Choice vector
This vector contains, for every instruction that has alternatives, the selected alternative. At the start of the function the loop get initialized using the Randomize function.

In this way the code produced by the Pass contains all the alternatives available for the instructions and the randomization of the CF is done using a random function and without the need to rewrite memory. Clearly there is no free lunch. The improved security comes at the cost of increasing the binary size.

Adding instruction alternatives

The steps required to add alternatives to an instruction starting from scratch are:

  • Add the instruction to the InstructionTy enumeration
  • Declare the alternative numbers
  • Handle the instruction inside getInstTy
  • Implement the build function
  • Add tests for the new alternative

All these steps are done automatilly from the Code Morphing Generator.

Add the instruction to the enumeration

Open the file include/cmp/InstructionAlternativeUtils.h and search for the enumeration InstructionTy. Add the name of the instruction you want to handle.

Important : The name used for the instruction must be the same of the one used in llvm for the instruction in the Instruction enumeration.

Declare the alternative numbers

Always in the same file, but some line later, there is a table that declare the alternatives available for all the instruction in the InstructionTy.

Add a line of the kind:

CMP_SET_ALTERNATIVE_NUMBER(INST, NUM);

Where INST is the name used in InstructionTy and NUM is an integer value corresponding to the alternatives you want to provide. (start counting from 1 :) )

Handle the instruction inside getInstTy

The implementation need a function able to do the reverse mapping from the llvm instruction to the enumeration InstructionTy.

This mapping is given by the getInstTy function.

Open the lib/CodeMorphing/InstructionAlternativeUtils.cpp and add to the body of the function a CHECK_INST invocation for your instruction.

Looking at the CHECK_INST macro you will understand why the name in the enumeration InstructionTy must be the same used in llvm.

Implement the build function

Open the lib/CodeMorphing/InstructionAlternatives.cpp and implement a specialization for the function template buildAlternatives for your instruction.

The function must return a vector containing all the alternative Basic Blocks. The terminator instruction will be put automatically by the transformation pass, so don’t put them.

Implement test

Last but not least implement test in the usual way with llvm.

Testing with real algorithms

If you want to test this pass with a real algorithm please take a look at Code Morphing Test.