py2bpf
translates functions from python to bpf, which is a linux kernel
bytecode.
This project is the working material behind a pycon talk, and not really a full formed or fleshed out projects. It's essentially unsupported, so use at your own risk.
bpf is a virtual machine bytecode that can be executed in the linux kernel in a variety of different places as hooks. You can hook things like packet arrival (at the socket with socket filter, within tc, or even within the NIC with xdp), software events (with kprobe and uprobe), and more.
In order to ensure safety, many operations are illegal in bpf, including arbitrary memory access, loops, recursion, and backward jumps. Instructions first pass through a verifier. On some architectures like x86_64, this bytecode is then compiled to raw machine code, so it's very low-overhead to execute.
See: _translation/_translate.py
First, we have to convert python's stack-based bytecode into one that is dealing with discrete variables. This means turning something like this
push(5)
push(10)
multiply()
into something that looks more like this
x = 5
y = 10
z = multiply(x, y)
Specifically, we're associating all of the instructions with their inputs
and outputs. Typically, an instruction will have anywhere from zero to
multiple inputs and a single output, but some instructions will have
multiple outputs -- ROT_THREE
transforms swaps the top 3 elements of the
stack around, so it'll have 3 inputs and 3 outputs.
This gets a little complicated with control flow -- an if/else statement gives two potential flows. We normalize this by iterating over every possible control path. This works because bpf disallows loops, recursion, and other interesting approaches, so we know iterating over every path won't be too expensive.
See: _translation/_vars.py
Obviously most python functionality is not available within bpf, so we need to evaluate as much of the program as we can before handing it off to be switched. To do, we convert all global loads and dereferences into constants and then evaluate things as far down the line as we can. As a result, the following would be permitted:
packet_short = py2bpf.funcs.load_skb_short(24)
if packet_short == socket.htons(12345):
return 0
Because socket.htons(12345)
can be eagerly evaluated to 0x3930
. The
following will not work, because we'd need to evaluate it in the bpf
context where socket.htons
is not available.
packet_short = py2bpf.funcs.load_skb_short(24)
# DOES NOT WORK!!!
if socket.ntohs(packet_short) == 12345:
return 0
See: _translation/_folding.py
In the bpf world, we don't have a magical runtime to store our variables for us, so we need to allocate space on the stack for each. We can and should take a sophisticated approach wherein we monitor lifetimes and reuse stack space, but currently we just use a bump allocator -- we decrementing into the stack offset every time we see a new pointer. This will run us out of space in theory but hasn't been a problem in practice.
See: _translation/_stack.py
At this point, we've got something that looks a lot more like bpf and a lot
less like python's stack machine. Now we can translate it to bpf using
simple templates. For example, if we see something like return 7
, we'll
translate it into a mov
instruction which puts the immediate value 7
into the return register R0
, and then we'll add the ret
instruction.
See: _bpf/_template_jit.py
py2bpf supports native bpf datastructures like map. These datastructures can be transparently shared between functions compiled to bpf and the userspace processes associated with them. For example, you could implement a socket filter that simply counts packets by protocol using the following.
m = py2bpf.datastructures.create_map(ctypes.c_uint32, ctypes.c_uint64, 16)
def filter_fn(skb):
m[skb.protocol] += 1
return 0
In the above example, you could reference the result from other python functions with something like.
for proto, count in m.items():
print('{} => {}'.format(proto, count))
You can also use bpf perf queues.
q = py2bpf.datastructures.BpfQueue(ctypes.c_int)
@py2bpf.kprobe.probe('sys_close')
def on_sys_close(pt_regs):
pid = py2bpf.funcs.get_current_pid_tgid() & 0xfffffff
ptr = py2bpf.funcs.addrof(pid)
cpuid = py2bpf.funcs.get_smp_processor_id()
py2bpf.funcs.perf_event_output(pt_regs, q, cpuid, ptr)
return 0
with on_sys_close():
for pid in q:
print('pid={}'.format(pid))
Limitations in the bpf bytecode mean that a lot of functionality is
provided via helper functions which can be called from bpf programs. These
helpers appear to you as simple python functions which can be invoked in
compiled functions. For example, the following will issue a printk to
/sys/kernel/debug/tracing/trace_pipe
.
def fn(ctx):
...
py2bpf.funcs.trace_printk("Hello World!")
...
These functions are generally described in /usr/include/linux/bpf.h
. The
major difference in signature is that it's annoying in python to have to
specify the array or string and then the length, so we provide the length
transparently instead (see fill_array_size_args
in
py2bpf/funcs.py:Func
).
See more examples in py2bpf/examples/
, but here is the abbreviated version.
Socket filtering was the original usecase of bpf (which actually stands for "Berkeley Packet Filter"). Here, we write a function that matches only IPv6/TCP connections.
def match(skb):
if skb.protocol != socket.htons(ETH_P_IP):
return 0
elif py2bpf.funcs.load_skb_byte(skb, 23) != socket.IPPROTO_TCP:
return 0
else:
return skb.len
sf = py2bpf.socket_filter.SocketFilter(match)
sf.attach(raw_sock)
raw_sock.recv(...) # And then do recv stuff
Kernel probes allow you to trace individual execution points within the linux kernel. The following example will simply print out the name of the calling program every time a connect is started.
@py2bpf.kprobe.probe('sys_nanosleep')
def watch_nanosleep(pt_regs):
return 0
with watch_nanosleep():
# do things
tc
is a utility that allows you to do everything from traffic shaping to
filtering. It has classifiers, which identify traffic; and actions, which
act upon traffic. You can implement either in bpf. Here's an example of a
classifier paired with the drop action which allows us to filter incoming
IPv4 traffic.
def drop_fn(skb):
if skb.protocol == socket.htons(ETH_P_IP):
return 1
return 0
fil = py2bpf.tc.IngressFilter(drop_fn)
fil.install()
fil.close()
Note that the filter in this case outlives the process. You must called
clear_ingress_filter
to get rid of it.
See also: man tc(8)