Move to nftables (from iptables) and use flowtables to improve performance

Question

Move to nftables (from iptables) and use flowtables to improve performance

Opened this issue 4 months ago · 5 comments

Hey Bill -

Migrating our firewall to nftables with flowtables offers significant performance improvements (lower CPU, higher throughput, reduced latency) for our routers, especially with established connections. This is achieved through a more efficient rule processing engine and, crucially for capable hardware, the ability to offload connection tracking and forwarding directly to the network interface hardware, completely bypassing the CPU for those packets. This not only scales better but also simplifies management in the long run.

Managing dynamic blacklists, whitelists, or even network service discovery becomes trivial, enabling more responsive security policies without performance penalties.

Software Flow Offloading (Applicable to ALL hardware, including Raspberry Pi):

Mechanism: For established TCP and UDP connections, flowtables enable a "fast path" that bypasses the bulk of the Netfilter chain traversal. The kernel performs a quick lookup in the flowtable at the earliest possible hook (ingress). If a match is found, the packet is immediately forwarded without further rule evaluation in forward, postrouting, etc.

Impact on Raspberry Pi 4B+:

Reduced CPU Cycles: The CPU spends significantly fewer cycles per packet for established flows, freeing up resources for other tasks (like WireGuard crypto or other services running on the Pi).

Higher PPS: The Pi can handle a much higher packet per second rate before CPU saturation, making it more capable of sustaining gigabit speeds, especially with many concurrent connections.

Lower Latency: Less processing time in the kernel translates directly to lower latency for ongoing data streams.

Even on resource-constrained devices like the Raspberry Pi, software flow offloading provides a measurable 20-50% (or more) reduction in CPU usage and a corresponding increase in throughput for established connections. This pushes the Pi's performance closer to line-rate for common traffic patterns.

On capable server hardware, hardware flow offloading effectively transforms the firewall into a wire-speed appliance for established traffic. This liberates the CPU for value-added services like robust VPN encryption, deep packet inspection, or application-layer firewalls, enabling true multi-gigabit routing performance without compromise.

While iptables commands on modern kernels might be translated to nftables bytecode (the xtables-nft backend), this only addresses syntax. It doesn't unlock the fundamental architectural advantages of nftables like flowtables, atomic updates, or direct hardware offload that a native nftables configuration provides.

While there's an initial learning investment, the unified syntax and clearer logic eventually simplify rule management. The long-term benefits in performance and maintainability far outweigh this initial overhead. We can start with a well-documented baseline configuration.

nftables does a better job at "kill switch" - Granular Control: nftables allows for precise rules that mark traffic intended for the VPN and strictly enforce that only marked traffic can leave the WAN. This is incredibly reliable.
Kernel-Level Enforcement: The rules are enforced directly by the Linux kernel, making them highly resilient to application crashes or user-space errors.
Zero-Trust by Default: By setting default policies to drop and only allowing traffic through specific, controlled paths (the VPN tunnel), nftables enables a truly secure-by-default posture.

I know this is somewhat of a radical departure from what you have, but modern iptables is being translated to nftables now (iptables is the old way). iptables is simple translation and does not embrace the flowtables concept. Moving fully to nftables and better, flowtables, you will gain a massive speed boost for your user community.

**

https://docs.kernel.org/networking/nf_flowtable.html
https://www.ubicloud.com/blog/improving-network-performance-with-linux-flowtables
https://thermalcircle.de/doku.php?id=blog:linux:flowtables_1_a_netfilter_nftables_fastpath

Answer 1 · 2025-06-28T18:07:10.000Z

This is untested, but here is the core idea/analysis:
https://x.com/i/grok/share/JhpTjXDz3gJFSBZ5Rxfm37id4

RASPAP would be responsible for modifying this script, clearing prior rules, and rerunning nft to implement new rule on vpn/tun change.

For Intel/debian based hardware with a server class nic that supports flowtable offloading, RASPAP performance would smoke any other platform out there.


# Define variables for interfaces and networks
define WAN_IF = eth0              # Your WAN/Internet interface
define WLAN_AP_IF = wlan0         # Your RaspAP wireless AP interface (LAN side)

define LAN_NET = 192.168.50.0/24  # Your RaspAP LAN subnet
define INTERNAL_NET = 192.168.0.0/16 # Broader internal network for INPUT rules

# Mark for VPN traffic (choose a unique mark value, e.g., 0x100 for VPN traffic)
define VPN_MARK = 0x100

# --- Flush existing nftables rules ---
flush ruleset

# --- Define dynamic sets for interfaces and allowed MACs ---
# These sets will be populated and managed by external scripts.

# Set to hold the names of currently active VPN interfaces
# Type 'ifname' ensures it stores interface names (e.g., "tun0", "wg0")
set active_vpn_interfaces {
    type ifname;
    flags dynamic; # Allows elements to be added/removed after creation
    # Initial empty state. Scripts will add/remove.
}

# Set to hold MAC addresses of automatically allowed clients on WLAN_AP_IF
# Type 'ether_addr' for MAC addresses.
# We'll put them in the 'inet' filter table.
set allowed_macs_on_lan {
    type ether_addr;
    flags dynamic;
    # Add any fixed MACs here if you have them, otherwise it starts empty.
    # elements = { 00:11:22:33:44:55, AA:BB:CC:DD:EE:FF }
}

# --- Table for IPv4 and IPv6 filtering ---
table inet filter {

    # 1. Define the flowtable for high-speed software forwarding
    flowtable ft_fastpath {
        hook ingress priority 0;
        # Include WAN and LAN interface. VPN interfaces will be added dynamically by the script.
        # It's okay to initially define it only with fixed interfaces; the kernel will adapt.
        devices = { $WAN_IF, $WLAN_AP_IF };
        counter;
        flags offload;
    }

    # 2. Base Chain for Input traffic (to the router itself)
    chain input {
        type filter hook input priority 0;
        policy drop; # Default to drop for security

        # Allow loopback interface traffic
        iif lo accept

        # Allow established and related connections (for router's own outgoing connections)
        ct state { established, related } accept

        # Allow traffic from internal networks (LAN_NET) to access the router (e.g., SSH, web GUI)
        ip saddr $LAN_NET accept

        # Allow DHCP renew/replies on WAN
        iif $WAN_IF udp sport 67 dport 68 accept # DHCPv4
        iif $WAN_IF icmp type echo-request accept # Allow pings from WAN (optional)

        # Allow incoming VPN connection if router is a VPN server (adjust port)
        # iif $WAN_IF udp dport 51820 accept # For WireGuard server
        # iif $WAN_IF tcp dport 1194 accept # For OpenVPN TCP server
        # iif $WAN_IF udp dport 1194 accept # For OpenVPN UDP server

        # Allow specific MAC addresses (defined in set) to access the router on LAN (e.g., if you had a stricter default)
        # This is more relevant for forwarding, but included for completeness if needed here.
        # iif $WLAN_AP_IF ether saddr @allowed_macs_on_lan accept

        # Drop invalid packets
        ct state invalid drop
    }

    # 3. Base Chain for Forward traffic (through the router)
    chain forward {
        type filter hook forward priority 0;
        policy drop; # Default to drop anything not explicitly allowed

        # Crucial: Allow established/related connections for conntrack to work.
        ct state { established, related } accept

        # Offload established TCP/UDP flows to the fastpath flowtable.
        ip protocol { tcp, udp } flow add @ft_fastpath

        # --- Dynamic VPN Kill Switch Logic ---
        # 1. Mark traffic from your LAN that is NOT to your LAN (i.e., Internet-bound)
        ip saddr $LAN_NET ip daddr != $LAN_NET meta mark set $VPN_MARK counter

        # 2. Allow marked VPN traffic to go to any interface in the dynamic VPN set
        meta mark $VPN_MARK oif @active_vpn_interfaces accept

        # 3. CRITICAL KILL SWITCH RULE: Drop any traffic from your LAN that is *not* marked
        # for VPN if it tries to exit via the WAN.
        ip saddr $LAN_NET oif $WAN_IF meta mark != $VPN_MARK drop counter comment "VPN Kill Switch: Block unmarked LAN traffic to WAN"

        # --- Anti-Leak for VPN Tunneling (VPN to VPN) ---
        # Block traffic between VPN interfaces
        iif @active_vpn_interfaces oif @active_vpn_interfaces drop

        # --- General Forwarding Rules (adjust if needed, kill switch covers main leak) ---

        # Allow traffic from VPN interfaces into LAN
        iif @active_vpn_interfaces oif $WLAN_AP_IF accept

        # Allow VPN traffic to go to WAN (if VPN is the primary internet exit for clients)
        iif @active_vpn_interfaces oif $WAN_IF accept

        # --- Automatic MAC Address Access ---
        # Allow new connections from MAC addresses found in the 'allowed_macs_on_lan' set to go to WAN or VPN
        # This assumes your clients get IPs from $LAN_NET.
        ip saddr $LAN_NET iif $WLAN_AP_IF ether saddr @allowed_macs_on_lan ct state new,established,related accept

        # Drop invalid packets
        ct state invalid drop
    }

    # 4. Base Chain for Output traffic (from the router itself)
    chain output {
        type filter hook output priority 0;
        policy accept; # Generally allow the router to make outbound connections

        # Allow router to connect to VPN endpoints even if VPN is down (for reconnection)
        # You need to replace "VPN_SERVER_IP" with the actual IP address(es) of your VPN provider's servers.
        # This prevents the kill switch from blocking the VPN *client* from reconnecting.
        # If your VPN service uses multiple IPs, list them in a set.
        # ip daddr { VPN_SERVER_IP_1, VPN_SERVER_IP_2 } tcp dport { 443, 1194 } accept # For TCP OpenVPN
        # ip daddr { VPN_SERVER_IP_1, VPN_SERVER_IP_2 } udp dport { 1194, 51820 } accept # For UDP OpenVPN/WireGuard

        # IMPORTANT: If your VPN uses DNS to resolve the server, you *must* allow DNS traffic *before* the kill switch for the router itself.
        # Otherwise, the router won't be able to resolve the VPN server IP when the tunnel is down.
        # ip daddr { 8.8.8.8, 8.8.4.4, 1.1.1.1 } udp dport 53 accept # Example: Google & Cloudflare DNS
        # If you have specific upstream DNS servers (e.g., ISP's), list them here.
        # This rule should be above `ct state invalid drop`
        # ct state invalid drop
    }
}

# --- NAT table for Masquerading (Postrouting) ---
table ip nat { # 'ip' family for IPv4 NAT
    chain postrouting {
        type nat hook postrouting priority 100;

        # Masquerade LAN_NET traffic leaving any active VPN interface
        ip saddr $LAN_NET oif @active_vpn_interfaces masquerade

        # Masquerade LAN_NET traffic leaving WAN_IF (ONLY if you want to allow direct internet access
        # for clients NOT using VPN, or if the kill switch is effectively off)
        # If ALL LAN traffic MUST go through VPN, comment out this rule.
        # ip saddr $LAN_NET oif $WAN_IF masquerade
    }
}````

Answer 2 · 2025-06-28T18:17:33.000Z

@frankozland great suggestion. Thanks also for the backgrounder and starter scripts. The advantages are clear.

We've already taken the first steps towards this migration in our CI build. However, this is just the start. There's a substantial amount of work (and even more testing) to be done across the application (installer, API, VPNs, firewall, plugins, etc).

Answer 3 · 2025-06-29T01:58:14.000Z

Sounds great Bill - If you get Debian + flowtables implemented i am very happy to take point to test - i have very high latency connection and can do real metric comparison.

Answer 4 · 2025-06-29T04:00:42.000Z

parallel to this optimization is to mark packets prior to nftables using eBPF. This adds a layer of complexity, but the gains may be worth it.

A fine target is nftables with flowtables for boost. For even better performance, intercept packet at NIC, mark them appropriately before nftables is called - drop in latency, drop in cpu, protection from DDoS, further enhances speed.

Executive version: All this does is move packet marking to kernel/nic driver level before nftables even starts - if hardware supports it - it means the cpu remains exceptionally low, even if under a sustained DDoS attack. But more importantly, on debian intel/6.15 hardware with server grade nic, the kernel would ONLY process packets that are allowed - invalid scriptkiddies trying to probe the device would never get thru. This enhances security and reduce cpu cost of bad actors.

Tho it does add complexity, the drop in latency, drop in cpu and increase in thruput makes it worth it. nf flowtables still in play, but pushing more work up chain before nftables boost performance and increases security to datacenter class.

https://x.com/i/grok/share/fOhK3rodYE27iKm4eFQBlBbyO

Adding NAT for user-specified ports (single, list, or range) with eBPF marking maintains 25–55% CPU reduction and 10–40% throughput increase on a Debian 6.15 system with a server-grade NIC, achieving ~8–9.5 Gbps vs. ~6–7.5 Gbps for VPN traffic with NAT. The kill switch remains fail-safe, any TUN device/VPN is supported via dynamic eBPF maps, and MAC passthrough for SSH is secure. XDP offloading maximizes gains, especially for dropping non-specified ingress.

Key Takeaways
nftables + eBPF offers the best performance: 25–60% CPU reduction, 10–50% throughput increase, and 15–50% higher packet rate across all devices, with NAT overhead mitigated by eBPF pre-filtering.

Kernel 6.6 Limitation: RaspAP’s kernel 6.6 on Pis/RK3399 lacks some eBPF features (e.g., advanced conntrack), slightly reducing gains compared to Debian’s kernel 6.15.

Answer 5 · 2025-06-29T07:04:25.000Z

Oh boy. Yeah, adding eBPF to the mix would be great. To be clear, this migration is a truly massive amount of work. Best (only?) approach to maintain sanity would be to tackle this in phases, ie., start with the installer and go from there.