/dg

[LLVM Static Slicer] Dependence graph for programs. Generic implementation of dependence graphs with instantiation for LLVM that contains a static slicer for LLVM bitcode

Primary LanguageC++MIT LicenseMIT

DG

Dg is a library which implements a dependence graph for programs. It contains a set of generic templates that can be specialized to user's needs. Dg can be used for different analyses, optimizations or program slicing (we currently use it for the last one). As a part of dg, you can find a pointer analyses, reaching definitions analysis and a static slicer for LLVM.

Whole project is under hard developement and lacks documentation for now, so in the case of need, contact us by an e-mail (below).

LLVM DependenceGraph && llvm-slicer

We have implemented a dependence graph for LLVM and a static slicer for LLVM.

Requirements & Compilation

LLVM DependenceGraph needs LLVM 3.4 or higher (tested on 3.4, 3.7, 3.8, and 3.9). We recently started supporting higher versions than 3.4 so if you have problems with the compilation, please let us know (file an issue). Also, the project has been restructuralized, so if you already have it cloned and after update it fails building, try cleaning the build directory and configure & make it again.

git clone https://github.com/mchalupa/dg
cd dg

Once you have the project cloned, you need to configure it. Fully manual configuration would look like this:

cmake -DLLVM_SRC_PATH=/path/to/src -DLLVM_BUILD_PATH=/path/to/build -DLLVM_DIR=path/to/llvm/share/llvm/cmake .

LLVM_DIR is an environment variable used by LLVM to find cmake config files (it points to the build or install directory), LLVM_SRC_DIR is a variable that tells cmake where to look for llvm's sources and it is used to override include paths. The same holds for LLVM_BUILD_PATH that is used to override library paths. Usually, you don't need to specify all these variables. When LLVM is installed on your system in standard paths, the configuration should look just like:

cmake .

If there is any collision (i.e. there's more versions of LLVM installed), you may need to define the LLVM_DIR variable to point to the directory where are the config files of the installed version ($PREFIX/share/llvm/cmake or $PREFIX/lib/cmake/llvm/ for newer versions). If you have LLVM compiled from sources, but not installed anywhere, you may need to use LLVM_SRC_PATH and LLVM_BUILD_PATH variables too. For the last case, suppose you have LLVM built in /home/user/llvm-build from sources in /home/user/llvm-src. Then following configuration should work:

cmake -DLLVM_SRC_PATH=/home/user/llvm-src -DLLVM_BUILD_PATH=/home/user/llvm-build -DLLVM_DIR=/home/user/llvm-build/share/llvm/cmake .

After configuring the project, usual make takes place:

make -j4

Testing

You can run tests with make check or make test. To change the pointer analysis used while testing, you can export DG_TESTS_PTA variable before running tests and set it to one of fi, fs or old.

Using the slicer

Compiled slicer can be found in the tools subdirectory. First, you need to compile your program into LLVM (make sure you are using the correct version of LLVM binaries if you have more of them):

clang -c -emit-llvm source.c -o bytecode.bc

If the program is split into more programs (exactly one of them must contain main), you must compile all of them separately (as above) and then link the bytecodes using llvm-link:

llvm-link bytecode1.bc bytecode2.bc ... -o bytecode.bc

Now you're ready to slice the program:

./llvm-slicer -c slicing_criterion bytecode.bc

The slicing_criterion is a call-site of some function or ret to slice with respect to the return value of the main function.

To export the dependence graph to .dot file, use -dump-dg switch with llvm-slicer or a stand-alone tool llvm-dg-dump:

./llvm-dg-dump bytecode.bc > file.dot

You can highligh nodes from dependence graph that will be in the slice using -mark switch:

./llvm-dg-dump -mark slicing_criterion bytecode.bc > file.dot

When using -dump-dg with llvm-slicer, the nodes whould be already highlighted. Also a .dot file with the sliced dependence graph is generated (similar behviour can be achieved with llvm-dg-dump using the -slice switch).

In the tools/ directory, there are few scripts for convenient manipulation with the sliced bytecode. First is a sliced-diff.sh. This script takes file and shows differences after slicing. It uses meld or kompare or just diff program to display the results (the first that is available on the system, in this order)

./llvm-slicer -c crit code.bc
./slicer-diff.sh code.bc

Another script is a wrapper around the llvm-dg-dump. It uses xdot or evince or okular (or xdg-open). It takes exactly the same arguments as the llvm-dg-dump:

./llvmdg-show -mark crit code.bc

If the dependence graph is too big to be displayed using .dot files, you can debug the slice right from the LLVM. Just pass -annotate option to the llvm-slicer and it will store readable annotated LLVM in file-debug.ll (where file.bc is the name of file being sliced). There are more options (try llvm-slicer -help for all of them), but the most interesting is probably the -annotate slicer:

./llvm-slicer -c crit -annotate slicer code.bc

The content of code-debug.ll will look like this:

; <label>:25                                      ; preds = %20
  ; x   call void @llvm.dbg.value(metadata !{i32* %i}, i64 0, metadata !151), !dbg !164
  %26 = load i32* %i, align 4, !dbg !164
  %27 = add nsw i32 %26, 1, !dbg !164
  ; x   call void @llvm.dbg.value(metadata !{i32 %27}, i64 0, metadata !151), !dbg !164
  store i32 %27, i32* %i, align 4, !dbg !164
  ; x   call void @llvm.dbg.value(metadata !{i32* %j}, i64 0, metadata !153), !dbg !161
  ; x   %28 = load i32* %j, align 4, !dbg !161
  ; x   %29 = add nsw i32 %28, 1, !dbg !161
  ; x   call void @llvm.dbg.value(metadata !{i32 %29}, i64 0, metadata !153), !dbg !161
  ; x   br label %20, !dbg !165

.critedge:                                        ; preds = %20
  ; x   call void @llvm.dbg.value(metadata !{i32* %j}, i64 0, metadata !153), !dbg !166
  ; x   %30 = load i32* %j, align 4, !dbg !166
  ; x   %31 = icmp sgt i32 %30, 99, !dbg !166
  ; x   br i1 %31, label %19, label %32, !dbg !166

Other interesting debugging options are ptr, rd, dd, cd, postdom to annotate points-to information, reaching definitions, data dependences, control dependences or post-dominator information. You can provide comma-separated list of more options (-annotate cd,slice,dd)

Example

We can try slice for example this program (with respect to the assertion):

#include <assert.h>
#include <stdio.h>

long int fact(int x)
{
	long int r = x;
	while (--x >=2)
		r *= x;

	return r;
}

int main(void)
{
	int a, b, c = 7;

	while (scanf("%d", &a) > 0) {
		assert(a > 0);
		printf("fact: %lu\n", fact(a));
	}

	return 0;
}

Let's say the program is stored in file fact.c. So we translate it into LLVM and then slice:

$ cd tools
$ clang -c -emit-llvm fact.c -o fact.bc
$ ./llvm-slicer -c __assert_fail fact.bc

The output is in fact.sliced, we can look at the result using llvm-dis or using sliced-diff.sh script:

; Function Attrs: nounwind uwtable
define i32 @main() #0 {
  %a = alloca i32, align 4
  br label %1

; <label>:1                                       ; preds = %4, %0
  %2 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i32 0, i32 0), i32* %a)
  %3 = icmp sgt i32 %2, 0
  br i1 %3, label %4, label %safe_return

; <label>:4                                       ; preds = %1
  %5 = load i32, i32* %a, align 4
  %6 = icmp sgt i32 %5, 0
  br i1 %6, label %1, label %7

; <label>:7                                       ; preds = %4
  call void @__assert_fail(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str1, i32 0, i32 0), ... [truncated])
  unreachable

safe_return:                                      ; preds = %1
  ret i32 0
}

To get this output conveniently, you can use:

./sliced-diff.sh fact.bc

Tools

The tools subdirectory contains a set of useful programs for debugging and playing with the llvm bitcode. Except for the llvm-slicer you can find there:

  • llvm-dg-dump - Dump the dependence graph for given program to graphviz format (to stdout)
  • llvmdg-show - wrapper for llvm-dg-dump that displays the graph as pdf
  • llvm-ps-dump - dump pointer subgraph and results of the points-to analysis to stdout
  • ps-show - wrapper for llvm-ps-dump that prints the PS in grapviz to pdf
  • llvm-rd-dump - display reaching definitions in llvm-bitcode
  • rd-show - wrapper for llvm-rd-dump
  • llvm-to-source - find lines from the source code that are in given file
  • llvm-to-source.py - wrapper around llvm-to-source that gives an HTML output

All these programs take as an input llvm bitcode, for example:

./ps-show code.bc

will show the pointer state subgraph for code.bc and the results of points-to analysis. Some useful switches for all programs are -pta fs and -pta fi that switch between flow-sensitive and flow-insensitive points-to analysis within all these programs that use points-to analysis. llvm-slicer and llvm-dg-dump still use the old points-to analysis by default, so when -pta switch is not used, the points-to analysis is different from that one that is used in llvm-ps-dump.


You can find more information about dg in http://is.muni.cz/th/396236/fi_m/thesis.pdf or you can write e-mails to: statica@fi.muni.cz mchqwerty@gmail.com