/bitey

Primary LanguagePythonOtherNOASSERTION

Bitey - Bitcode Import Tool

Import LLVM bitcode directly into Python and use it as an extension module.

Warning

THIS IS PROOF-OF-CONCEPT SOFTWARE THAT DESPITE ITS CUTE NAME, MIGHT ACTUALLY BITE YOUR ENTIRE LEG OFF. USE AT YOUR OWN RISK!

Requirements

You'll need to have a pretty complete LLVM development environment installed on your machine. Bitey has been developed using LLVM/Clang-3.1. You might need to install it yourself.

In addition, you need to install the llvm-py extension. Get it at at http://www.llvmpy.org.

Bitey is unlikely to work with any older version of LLVM or the llvm-py extension--especially preinstalled versions distributed with your operating system. You need to be using bleeding-edge modern versions of these libraries.

Example and Basic Tutorial

First, you need some C code. Something important like computing a fibonacci number:

/* fib.c */
int fib(int n) {
    if (n < 3) {
       return 1;
    } else {
       return fib(n-1) + fib(n-2);
    }
}

Now, compile it into LLVM bitcode using clang:

bash % clang -emit-llvm -c fib.c

This makes an object file fib.o as usual--only the .o file contains LLVM bitcode. Now, just import it into Python:

>>> import bitey
>>> import fib
>>> fib.fib(38)
39088169
>>>

Yes, that's it. Bitey does not use the C compiler, the linker, or the dynamic loader. You don't write wrapper functions either. Write normal C, compile it with clang, and import it. Done.

Bitey understands most basic C datatypes including integers, floats, void, pointers, arrays, and structures. Because it builds a ctypes based interface, you would access the code using the same techniques. Here is an example that mutates a value through a pointer:

/* mutate.c */

void mutate_int(int *x) {
     *x *= 2;
}

Here's how you would use this:

% clang -emit-llvm -c mutate.c
% python
>>> import bitey
>>> import mutate
>>> import ctypes
>>> x = ctypes.c_int(2)
>>> mutate.mutate_int(x)
>>> x.value
4
>>>

Here is an example involving a structure:

/* point.c */
#include <math.h>

struct Point {
    double x;
    double y;
};

double distance(struct Point *p1, struct Point *p2) {
    return sqrt((p1->x - p2->x)*(p1->x - p2->x) +
                (p1->y - p2->y)*(p1->y - p2->y));
}

To run:

% clang -emit-llvm -c point.c
% python
>>> import bitey
>>> import point
>>> p1 = point.Point(3,4)
>>> p2 = point.Point(6,8)
>>> point.distance(p1,p2)
5.0
>>>

One subtle issue with structure wrapping is that LLVM bitcode doesn't encode the names of structure fields. So, Bitey simply assigns them to an indexed element variable like this:

>>> p1.e0         # (Returns the .x component)
3
>>> p1.e1         # (Returns the .y component)
4

This can be fixed using a pre-load module as described in the "Advanced Topics" section later.

If you need to combine two LLVM object files together into a single importable module, use llvm-ld like this:

% llvm-ld point.o fib.o -b combined.o
% python
>>> import bitey
>>> import combined
>>> combined.fib(10)
55
>>> p1 = combined.Point(3,4)
>>> p2 = combined.Point(6,8)
>>> combined.distance(p1,p2)
5.0
>>>

The C code you write can link with external libraries, but you might need to take special steps to load the library prior to import. For example, suppose you compiled the Fibonacci code into a shared library like this:

# OS-X
% gcc -bundle -export_dynamic fib.c -o libfib.so

# Linux
% gcc -shared fib.c -o libfib.so

Now, suppose you had some C code that wanted to access this library:

/* sample.c */
#include <stdio.h>
extern int fib(int n);

void print_fib(int n) {
    while (n > 0) {
        printf("%d\n", fib(n));
        n--;
    }
}

If you try to build it normally, you'll get an error:

% clang -emit-llvm -c sample.c
% python
>>> import bitey
>>> import sample
LLVM ERROR: Program used external function 'fib' which could not be resolved!
%

However, you can load the library yourself doing this:

 % python
 >>> import bitey
 >>> bitey.load_library("./libfib.so")
<CDLL './libfib.so', handle 1003cfc60 at 10049d090>
 >>> import sample
 >>> sample.print_fib(10)
 55
 34
 21
 13
 8
 5
 3
 2
 1
 1
 >>>

It is important to note that Bitey is NOT a wrapper generator meant to access already-compiled C libraries. It only exposes functionality that has been explicitly compiled as LLVM bitcode. To access the contents of a library, you would need to compile and link it using clang and llvm-ld as shown in the examples.

How it works

Bitey extends Python with an import hook that looks for .o files containing LLVM bitcode. Type signatures and other information in the bitcode are then used to build a ctypes-based binding to the natively compiled functions contained within an LLVM execution engine. It's all a bit magical, but the LLVM JIT generates the executable code whereas Bitey makes the ctypes binding to it---all behind the scenes on import.

It's important to stress that Bitey does not use the C compiler, the linker, the dynamic loader, or make calls to subprocesses. It is completely self-contained and only uses the functionality of llvm-py and ctypes.

Performance

The performance profile of Bitey is going to be virtually identical that of using ctypes. LLVM bitcode is translated to native machine code and Bitey builds a ctypes-based interface to it in exactly the same manner as a normal C library.

As a performance experiment, here is a simple C function that checks if a number is prime or not:

int isprime(int n) {
    int factor = 3;
    /* Special case for 2 */
    if (n == 2) {
        return 1;
    }
    /* Check for even numbers */
    if ((n % 2) == 0) {
       return 0;
    }
    /* Check for everything else */
    while (factor*factor < n) {
        if ((n % factor) == 0) {
            return 0;
        }
        factor += 2;
    }
    return 1;
 }

Try compiling this code into LLVM and a C shared library:

% clang -O3 -emit-llvm -c isprime.c

# OS-X
% gcc -O3 -bundle -undefined dynamic_lookup isprime.c -o isprime.so

# Linux
% gcc -O3 -shared isprime.c -o isprime.so

Now, let's put Bitey and ctypes in a head-to-head performance battle:

>>> import bitey
>>> from isprime import isprime as isprime1
>>> import ctypes
>>> ex = ctypes.cdll.LoadLibrary("./isprime.so")
>>> isprime2 = ex.isprime
>>> isprime2.argtypes=(ctypes.c_int,)
>>> isprime2.restype=ctypes.c_int
>>>
>>> from timeit import timeit
>>> # Bitey
>>> timeit("isprime1(3)","from __main__ import isprime1")
1.1813910007476807
>>> # ctypes
>>> timeit("isprime2(3)", "from __main__ import isprime2")
1.2408909797668457
>>>
>>> # Bitey
>>> timeit("isprime1(10143937)", "from __main__ import isprime1")
9.839216947555542
>>> # ctypes
>>> timeit("isprime2(10143937)", "from __main__ import isprime2")
9.663991212844849
>>>

As you can see, the performance is just about the same. The main difference would come down to the efficiency of LLVM vs. gcc code optimization.

Advanced Usage

If you're up for a bit of adventure, the module creation process can be altered through the use of pre and post loading files.

A pre-load file provides Python code that executes within the newly created module prior to the LLVM-binding step. One use of this code is to specify the names of fields on data structures. For example, you can create the following pre-load file for the earlier Point example:

# point.pre.py

class Point:
    _fields_ = ['x','y']

If you do this, you'll find that the field-names get fixed:

>>> import point
>>> p = point.Point(3,4)
>>> p.x
3.0
>>> p.y
4.0
>>>

You could also use a pre-load file to load library dependencies:

# sample.pre.py
import bitey
bitey.load_library("./libfoo.so")

A post-load file allows you alter the contents of the module after LLVM-binding. You could use this to apply decorators or add additional support code. For example:

# point.post.py
#
# Example of decorating a function already wrapped

def decorate(func):
    def wrapper(*args, **kwargs):
        print "Calling", func.__name__
        return func(*args, **kwargs)
    wrapper.__name__ = func.__name__
    return wrapper

# Wrap the distance wrapper already created
distance = decorate(distance)

The combination of the pre/post loading files gives you almost unlimited opportunity for insane evil when loading the bitcode. It must be stressed that that these files are executed in the space of the module being created---they are not separate imports (i.e., the pre, post, and LLVM bindings all co-exist in the same module namespace).

Automatic Binding

In the examples, it is necessary to use import bitey for modules to be recognized and loaded. If you want to skip this step and make everything automatic, create a bitey.pth file that contains the following statement:

# bitey.pth
import bitey

Now, copy this file to the Python site-packages directory.

FAQ

Q: Will Bitey ever support C++?

A: No. C++ can bite me (*)

(*) I also wrote Swig and still have C++ scars.

Q: Why is it called "Bitey?"

A: Well, "Bitey" is so much more catchy than simply calling it something boring like "BIT (Bitcode Import Tool)". Plus, just like @johnderosa's pet Pomeranian of the same name, you're never quite sure whether "Bitey" is adorably cute or a viscious beast that will constantly nip your leg. Actually, I just like the ring of it--"Bitey" sort of rhymes with "Enterprisey".

Discussion Group

A discussion group for Bitey is available at http://groups.google.com/group/bitey

Authors