This is a small tiny BPF filter that demonstrates how to encap/decap an IPv4 packet with MPLS.
The goal of this project is to be a good learning resource & skeleton project of how to setup a project for writing & building an eBPF filter. Documentation on the subject is scattered largely for eBPF across man-pages, e-mail lists & blog-posts. What's worse is that the date of publication of many of them are quite old now, and don't reflect the best practices as of today.
The eBPF filter is found in mpls_bpf_kern.c, with the source heavily commented to help new readers understand what is going on.
This example performs MPLSinIP encapsulation/decapsulation as defined in RFC4023.
MPLS-in-IP messages have the following format:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| IP Header |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| MPLS Label Stack |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Message Body |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
MPLS label is defined in RFC3032:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label | TC |S| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Label: Label Value, 20 bits
TC: Traffic Class field, 3 bits
S: Bottom of Stack, 1 bit
TTL: Time to Live, 8 bits
For the purpose of this demo, the MPLS label is always 0x45, however more advanced label switching would perform different actions based on the label value.
A simple file test.sh is included that will:
- Create two network namespaces: machine-1 & machine-2
- Create a virtual network interface pair; one in each network namespace
- Setup the network interfaces to be able to ping each other
- Add a qdisc to the network interfaces
- Add the compiled bpf filter via tc
After running the script you should see the output of ping:
Pinging from machine-1 to machine-2
PING 10.132.204.33 (10.132.204.33) from 10.132.204.25 : 56(84) bytes of data.
64 bytes from 10.132.204.33: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 10.132.204.33: icmp_seq=2 ttl=64 time=0.089 ms
64 bytes from 10.132.204.33: icmp_seq=3 ttl=64 time=0.133 ms
64 bytes from 10.132.204.33: icmp_seq=4 ttl=64 time=0.035 ms
64 bytes from 10.132.204.33: icmp_seq=5 ttl=64 time=0.089 ms
In order to verify all is working let's check the debug trace logs! (do the following in the host namespace)
# Turn on tracing logs
echo 1 > /sys/kernel/debug/tracing/tracing_on
# Let's turn on the debug
sudo ./mpls.bin enable
Successfully enabled.
# Confirm it's enabled
sudo ./mpls.bin show
debug flag: true
# You can cat the pipe
cat /sys/kernel/debug/tracing/trace_pipe
ping-11635 [000] ..s1 136779.910443: 0: [decap][815794764]finished mpls decap.
ping-11635 [000] .... 136780.935386: 0: [encap][2508757858]starting mpls encap.
ping-11635 [000] .... 136780.935404: 0: [encap][2508757858]casted to eth header.
ping-11635 [000] .... 136780.935406: 0: [encap][2508757858]casted to ip header.
ping-11635 [000] .... 136780.935408: 0: [encap][2508757858]calculated ip header length.
ping-11635 [000] .... 136780.935412: 0: [encap][2508757858]about to store bytes of MPLS label: 0x45
ping-11635 [000] .... 136780.935414: 0: [encap][2508757858]finished mpls encap.
ping-11635 [000] ..s1 136780.935426: 0: [decap][1560953898]starting mpls decap.
ping-11635 [000] ..s1 136780.935428: 0: [decap][1560953898]decoded MPLS label: 0x45
ping-11635 [000] ..s1 136780.935430: 0: [decap][1560953898]finished mpls decap.
You can list all BPF programs loaded:
bpftool prog
42: sched_cls tag c2678af39418836e
xlated 1640B jited 963B memlock 4096B
43: sched_cls tag aa3fa6025585b31a
xlated 1888B jited 1100B memlock 4096B
You can also view the output of the JIT if you run the following.
bpftool prog dump jited id 42
...
3b4: mov $0x1e,%esi
3b9: callq 0xffffffffc7ed7c16
3be: jmpq 0x0000000000000197
...
# On older linux kernels, you have to explicitly turn on JIT
# echo 1 > /proc/sys/net/core/bpf_jit_enable
You can use llvm-objdump
to also see the contents of the eBPF
# -g prints the line numbers
# -S prints the instructions with associated C code
llvm-objdump -S -g mpls.bpf
You should be able to view the BPF_MAP also pinned onto the filesystem:
sudo tree /sys/fs/bpf/tc
/sys/fs/bpf/tc
└── globals
└── DEBUGS_MAP
1 directory, 1 file
sudo bpftool map show id 53 -f
53: array flags 0x0
key 4B value 1B max_entries 1 memlock 4096B
pinned /sys/fs/bpf/tc/globals/DEBUGS_MAP
A mpls.bin
command is provided, that allows interacting with the eBPF program loaded.
You can enable or disable the debug output.
This will change the visibility of the debug print messages in /sys/kernel/debug/tracing/trace_pipe
./mpls.bin show
debug flag: false
./mpls.bin enable
Successfully enabled.
Running the test.sh script deletes at the start any network namespace prior and starts off fresh.
You can build on an OracleLinux 7.6 machine or there is limited support for OSX via docker.
Simply use the provided Makefile
.
make
Simply use the provided Makefile
but be sure to run the docker
target.
# Install llvm latest if you don't have it!
brew install --with-toolchain llvm
# Builds the BPF filter in an OracleLinux docker image
make docker