Low throughput when use this driver
yrpang opened this issue · 2 comments
I tried to use a 100Gbps QSFP28 DAC cable to connect two U50s and ran an iperf speed test, but the result was only about 26Gbps.
Is this expected? What is the maximum speed that the driver can achieve?
Hi @yrpang,
The Linux networking stack has some CPU overhead. Performance measuring requires some amount of experimentation can vary depending on your machines and how many cores that you use. It's hard for a single core to saturate 100G link. To measure performance you might want to try something like:
Run n=8 iperf3 processes, each 5Gb/s target and sum them to measure a total of: e.g. ~32 Gb/s
, where each process is like the following:
taskset -c 7 iperf3 -c 192.168.20.2 -p 36696 --bind 192.168.20.4 --cport 35986 -t 40 -b 5G > iperf_client_7.log &
Or similarly, with n=16 and each at 5 Gb/s, and you might measure e.g. a total of ~52 Gb/s
(iperf3 is single threaded)
These numbers are just based on one setup that I used a while back. With the DPDK driver and pktgen you can more easily reach line rate depending on the capabilities of your machines.
--Chris
Really thank you for your reply.
The 26Gbps result was obtained with iperf2 iperf version 2.0.5 (2 June 2018) pthreads
, use iperf -c 192.168.4.2 -P 4
on client side and iperf -s
on server side.
And I've tried to use iperf3 as the following:
- Start 5 iperf3 server listen to 5 different ports
- Use the following script to start 5 iperf3 client and each bound to a CPU.
taskset -c 1 iperf3 --client 192.168.4.2 -p 5202 -t 40 -b 20G > iperf_client_1.log &
taskset -c 2 iperf3 --client 192.168.4.2 -p 5203 -t 40 -b 20G > iperf_client_2.log &
taskset -c 3 iperf3 --client 192.168.4.2 -p 5204 -t 40 -b 20G > iperf_client_3.log &
taskset -c 4 iperf3 --client 192.168.4.2 -p 5205 -t 40 -b 20G > iperf_client_4.log &
taskset -c 5 iperf3 --client 192.168.4.2 -p 5201 -t 40 -b 20G > iperf_client_5.log &
The result is the following:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-40.00 sec 22.7 GBytes 4.88 Gbits/sec 5432 sender
[ 5] 0.00-40.04 sec 22.7 GBytes 4.87 Gbits/sec receiver
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-40.00 sec 23.5 GBytes 5.06 Gbits/sec 4837 sender
[ 5] 0.00-40.04 sec 23.5 GBytes 5.05 Gbits/sec receiver
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-40.00 sec 24.5 GBytes 5.25 Gbits/sec 7406 sender
[ 5] 0.00-40.04 sec 24.4 GBytes 5.24 Gbits/sec receiver
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-40.00 sec 34.5 GBytes 7.41 Gbits/sec 11417 sender
[ 5] 0.00-40.04 sec 34.5 GBytes 7.40 Gbits/sec receiver
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-40.00 sec 32.9 GBytes 7.06 Gbits/sec 7023 sender
[ 5] 0.00-40.04 sec 32.9 GBytes 7.05 Gbits/sec receiver
These 5 client add up to about 30Gbps. I also tried to run 1, 2, 3 or 4 clients, the total bandwidth is also about 30Gbps. It seems that no matter how many clients are started, the total bandwidth is around 30Gbps.
Also, to confirm whether it is the link rate problem of CMAC, I connect FPGA with a mellanox connectx-5 and use ethtool to check the negotiated rate of mellanox connectx-5. It said the speed is 100Gbps. But the test result of iperf is still about 26Gbps.
It seems weird that the total bandwidth doesn't scale with CPU cores. Looks like there is a bottleneck somewhere that is limiting the total bandwidth, but I don't know where it is.
Is there anything I'm missing or which tests should I add? If there is anything I need to add, please feel free to say and I will add more information. Really thank you for help!