Goal: manage incoming connections using eBPF sk_lookup programs.
Use case: on a server with no split network namespaces, all daemons are running in the root network namespace to avoid complex network setup such as NAT and subnets. Each daemon is given a 127.0.0.0/8 IPv4 and ULA IPv6 address to listen to and must not listen to INADDR_ANY. On incoming connection, the manager shall redirect to the correct socket for the given port number using a configuration.
Operations:
- the manager is configured with a list of port number and destination address mapping
- upon receiving a new connection, the manager tries to find an already listening socket that matches, if so, it selects this socket
- upon failure, it looks up the destination port number and fetches the IP address (which might be provate) to select a socket suitable
- upon failure, it returns and the kernel will reject the packet as there is no matching socket listening
bpf_sk_lookup_tcp and bpf_sk_lookup_udp and not accessible to sk_lookup BPF programs, the kernel should be patched.
It seems that if calling bpf_sk_lookup_tcp from within a sk_lookup BPF program would cause an infinite recursion if it was possible:
__inet_lookup_listener()
innet/ipv4/inet_hashtables.c
will call the BPF programbpf_sk_lookup_tcp()
is called withinnet/core/filter.c
bpf_sk_lookup()
is calledbpf_skc_lookup()
is the next function called, which is a wrapper around__bpf_skc_lookup()
sk_lookup()
is then called after namespace lookup (if needed)- then depending on the protocol, it will call
__inet_lookup()
for TCP4 (or__udp4_lib_lookup
,__inet6_lookup
orudp6_lib_lookup
function pointer) __inet_lookup()
is back innet/ipv4/inet_hashtables.c
, it will lookup connected sockets, and if not found__inet_lookup_listener()
is called to find unconnected listening sockets- which calls the BPF program again, looping
If so, perhaps some BPF wrapper could be added to directly lookup a listening sockets from the inet/inet6 hashtables without looking up already connected sockets nor calling recursively BPF code.
It seems rather simple: add a BPF wrapper code that would work much like the
already existing bpf_sk_lookup_tcp/udp but instead of calling
__inet_lookup_listener()
, it would call directly the hashtable querying
functions (ipv4_portaddr_hash
, inet_lhash2_bucket
and inet_lhash2_lookup
)
and return a result.
This way, BPF code could query sockets and choose if it wants to lookup sockets
from a specific listening address or INADDR_ANY
or both. It would add
flexibility to BPF programs.
BPF helper functions are checked within kernel/bpf/verifier.c
and an "unknown
func" error message is returned if the function is not allowed for the given
program type. The functions available for each program type is defined in
sk_lookup_func_proto
(for the sk_lookup program type) in net/core/filter.c
.
Each program type, for a given helper function name can associate a different implementation. Here, we would want an implementation that would not cause infinite loops.
Without changing the Linux kernel, there is a way to get to it:
- On CLI, get a list of mappings : inet protocol / port to address and port
- Use sock_diag API to get the inode number of sockets matching the to part
- Parse
/proc/*/fd/*
to get a PID from a socket inode - Use
pidfd_open
andpidfd_getfd
to get file descriptor from socket - Install a sk_lookup BPF program that performs the mapping using the file descriptors collected
- If not done already, install a BPF trace program to monitor calls to
bind()
- When there is a
bind()
called, reload the listening socket table using sock_diag - if there are changes relevant to our configuration (check if file descriptors are identical using fstat on both and compare inode/dev numbers), reload the sk_lookup BPF program with the new file descriptors.
- clang with bpf backend
- libbpf
make
Work in progress. Run as root:
cat /sys/kernel/debug/tracing/trace_pipe &
src/sk_lookup_manager -t 5555=0.0.0.0:22 -t 5555=[::]:22 -t 5556=0.0.0.0:22
Run a server to check:
nc -l 5555
nc -l 127.0.0.2 5556
Connect to port 5555, you are redirected to your SSH server:
nc ::1 5555
nc 127.0.0.2 5556
- Restart sshd then retry
nc ::1 5555
, it should have detected new socket.
sudo cat /sys/kernel/debug/tracing/trace_pipe
- Cloudflare presentation: https://blog.cloudflare.com/its-crowded-in-here/
- eBPF sk_lookup: https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html
- Building: https://qmonnet.github.io/whirl-offload/2020/04/12/llvm-ebpf-asm/
- BPF Helpers: bpf-helpers(7)
- BPF System call: bpf(2)
- sk_lookup sample tools/testing/selftests/bpf/prog_tests/sk_lookup.c for the C part and tools/testing/selftests/bpf/progs/test_sk_lookup.c for the eBPF part.
- libbpf with libbpf-bootstrap and blog post
- BPF kernel development
Kernel Virtual Machine or this setup
Create the VM
wget https://cdimage.debian.org/cdimage/bullseye_di_alpha3/amd64/iso-cd/debian-bullseye-DI-alpha3-amd64-netinst.iso
qemu-img create -f raw debian.img 3G
qemu-kvm -m 512 -boot d -hda debian.img -cdrom debian-bullseye-DI-alpha3-amd64-netinst.iso
Boot debian kernel:
qemu-kvm -smp 2 -m 512 -boot c -vga std -soundhw es1370 -net nic -net user,hostfwd=tcp::10022-:22 -hda debian.img
- login: debian
- password: debian
- root password: root
- connect to VM:
ssh debian@127.0.0.1 -p 10022
- transfer files:
ssync --rsh='ssh -p 10022' root@127.0.0.1:/boot/config-5.10.0-3-amd64 ../linux/debian-config-5.10.0-3-amd64