/FlexTOE

Flexible, high-performance TCP offload to SmartNICs using fine-grained parallelism

Primary LanguageCBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism

FlexTOE is a flexible, yet high-performance TCP offload engine (TOE) to SmartNICs. FlexTOE eliminates almost all host data-path TCP processing and is fully customizable. FlexTOE interoperates well with other TCP stacks, is robust under adverse network conditions, and supports POSIX sockets.

FlexTOE focuses on data-path offload of established connections, avoiding complex control logic and packet buffering in the NIC. FlexTOE leverages fine-grained parallelization of the TCP data-path and segment reordering for high performance on wimpy SmartNIC architectures, while remaining flexible via a modular design. We compare FlexTOE on a Netronome Agilio-CX40 to host TCP stacks Linux and TAS, and to the Chelsio Terminator TOE. We find that Memcached scales up to 38% better on FlexTOE versus TAS, while saving up to 81% host CPU cycles versus Chelsio. FlexTOE provides competitive performance for RPCs, even with wimpy SmartNICs. FlexTOE cuts 99.99th-percentile RPC RTT by 3.2× and 50% versus Chelsio and TAS, respectively. FlexTOE's data-path parallelism generalizes across hardware architectures, improving single connection RPC throughput up to 2.4× on x86 and 4× on BlueField.

FlexTOE supports C and XDP programs written in eBPF. It allows us to implement popular data center transport features, such as TCP tracing, packet filtering and capture, VLAN stripping, flow classification, firewalling, and connection splicing.

For more information, please refer to:

Prerequisites

Run

  • make to build.
  • Setup 1G hugepages. Mount hugetlbfs at /dev/hugepages-1048576kB
mount -t hugetlbfs -o pagesize=1G none /dev/hugepages-1048576kB
echo 4 | tee /sys/devices/system/node/node*/hugepages/hugepages-1048576kB/nr_hugepages
  • Run
# Load driver
modprobe uio
insmod kernel/flextoe_uio.ko

# Show supported network devices; Obtain pci_id
./scripts/devbind.py --status

# Bind network device to driver
./scripts/devbind.py -b flextoe_uio <pci_id>

# Run FlexTOE
./user/flextoe.out

Usage

Usage: ./user/flextoe.out [OPTION]... --ip-addr=IP[/PREFIXLEN]

Memory Sizes:
  --shm-len=LEN               Shared memory len [default: 1073741824]
  --nic-rx-len=LEN            Kernel rx queue len [default: 256]
  --nic-tx-len=LEN            Kernel tx queue len [default: 256]
  --app-kin-len=LEN           App->Kernel queue len [default: 1048576]
  --app-kout-len=LEN          Kernel->App queue len [default: 1048576]

TCP protocol parameters:
  --tcp-rtt-init=RTT          Initial rtt for CC (us) [default: 50]
  --tcp-link-bw=BANDWIDTH     Link bandwidth (gbps) [default: 40]
  --tcp-rxbuf-len             Flow rx buffer len [default: 8192]
  --tcp-txbuf-len             Flow tx buffer len [default: 8192]
  --tcp-handshake-timeout=TIMEOUT  Handshake timeout (us) [default: 10000]
  --tcp-handshake-retries=RETRIES  Handshake retries [default: 10]

Congestion control parameters:
  --cc=ALGORITHM              Congestion-control algorithm [default: dctcp-rate]
     Options: dctcp-win, dctcp-rate, const-rate, timely
  --cc-control-granularity=G  Minimal control iteration [default: 50]
  --cc-control-interval=INT   Control interval (multiples of RTT) [default: 2]
  --cc-rexmit-ints=INTERVALS  #of RTTs without ACKs before rexmit [default: 4]
  --cc-dctcp-weight=WEIGHT    DCTCP: EWMA weight for ECN rate [default: 0.062500]
  --cc-dctcp-mimd=INC_FACT    DCTCP: enable multiplicative inc  [default: disabled]
  --cc-dctcp-min=RATE         DCTCP: minimum cap for flow rates  [default: 0]
  --cc-const-rate=RATE        Constant rate for all flows [default: 0]
  --cc-timely-tlow=TIME       Timely: low threshold (us) [default: 30]
  --cc-timely-thigh=TIME      Timely: high threshold (us) [default: 150]
  --cc-timely-step=STEP       Timely: additive increment step (kbps) [default: 10000]
  --cc-timely-init=RATE       Timely: initial flow rate (kbps) [default: 10000]
  --cc-timely-alpha=FRAC      Timely: EWMA weight for rtt diff [default: 0.020000]
  --cc-timely-beta=FRAC       Timely: mult. decr. factor [default: 0.800000]
  --cc-timely-minrtt=RTT      Timely: minimal rtt without queueing [default: 11]
  --cc-timely-minrate=RTT     Timely: minimal rate to use [default: 10000]

IP protocol parameters:
  --ip-route=DEST[/PREFIX],NEXTHOP  Add route
  --ip-addr=ADDR[/PREFIXLEN]        Set local IP address

ARP protocol parameters:
  --arp-timeout=TIMEOUT       ARP request timeout (us) [default: 500]
  --arp-timeout-max=TIMEOUT   ARP request max timeout (us) [default: 10000000]

Miscellaneous:
  --fp-poll-interval-app      App polling interval before blocking in us [default: 10000]
  --quiet                     Disable non-essential logging [default: disabled]
  --debug-console             Enable debug console [default: disabled]