Slower bandwidth compared to sys-firewall
grote opened this issue · 13 comments
I am debugging why I don't get my full 1Gbps bandwidth on Qubes OS which I (almost) get when booting Ubuntu from a USB flash drive. During this I noticed that using mirage-firewall provides worse performance than Qubes' default firewall.
when using mirage-firewall:
when using sys-firewall:
Could it be that mirage-firewall has bandwidth limitations?
What about the bandwidth on sys-net? Better to take iperf3
Alright, so I set up iperf3
on a machine in the local network connected via 1Gbps ethernet running in server mode. Then I ran three client tests against it and the results are similar:
sys-net:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 963 MBytes 808 Mbits/sec 0 sender
[ 5] 0.00-10.04 sec 961 MBytes 804 Mbits/sec receiver
sys-firewall:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 709 MBytes 595 Mbits/sec 5 sender
[ 5] 0.00-10.04 sec 706 MBytes 590 Mbits/sec receiver
mirage-firewall:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 207 MBytes 174 Mbits/sec 0 sender
[ 5] 0.00-10.04 sec 205 MBytes 172 Mbits/sec receiver
It is a shame when you have a 1Gbps fiber link and can't fully utilize it :(
Possibly relevant:
If I turn on Scatter Gather as suggested in QubesOS/qubes-issues#3510 I get:
with sys-firewall:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 997 MBytes 836 Mbits/sec 253 sender
[ 5] 0.00-10.04 sec 993 MBytes 830 Mbits/sec receiver
with mirage-firewall:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 203 MBytes 171 Mbits/sec 0 sender
[ 5] 0.00-10.04 sec 201 MBytes 168 Mbits/sec receiver
So this seems to fix the issue with sys-firewall (but note the number of retries), but mirage-firewall has actually slightly worse performance.
hey, so my experience with MirageOS unikernels is that the OCaml optimizer "flambda" helps to a big degree. I have not tested this with the QubesOS firewall, but would you mind to either
- manually create a fresh opam switch
opam sw create 4.11.1+flambda
and compile the firewall there - if using Docker use the
ocaml/opam:debian-10-ocaml-4.11-flambda
container (or instead of debian 10 whichever distribution you prefer)
The resulting unikernel should be semantically equivalent, but allocating much less memory and thus more performant.
Another very suitable optimization is to use the best-fit allocation policy by passing --allocation-policy=best-fit
to the unikernel -- either at the configuration stage (mirage configure -t xen --allocation-policy=best-fit
) or at runtime as boot arguments (qubes... kernelopts '--allocation-policy=best-fit`).
I'd be very interested to see the number matrix of: baseline (mirage-qubes-firewall, as above); best-fit; flambda; flambda + best-fit.
Thanks for your report including figures for comparison.
NB: I scheduled (and finished) the builds above, please have a try with https://data.robur.coop/qubes-firewall-flambda/2020-12-05/ with the expectation to be the fast one (the best-fit allocation policy was already enabled at configuration time)
and https://data.robur.coop/qubes-firewall/2020-12-05/ for a unikernel with the "standard" OCaml compiler, but best-fit enabled at cofiguration time.
Espectially the first one (with flambda) would be interesting to see its performace numbers on your hardware.
Thanks, hannes
last release
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 237 MBytes 199 Mbits/sec 20 sender
[ 5] 0.00-10.04 sec 236 MBytes 197 Mbits/sec receiver
best-fit enabled
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 523 MBytes 438 Mbits/sec 99 sender
[ 5] 0.00-10.04 sec 520 MBytes 435 Mbits/sec receiver
flambda
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 531 MBytes 446 Mbits/sec 151 sender
[ 5] 0.00-10.04 sec 529 MBytes 443 Mbits/sec receiver
For comparison again what sys-firewall is giving me:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 854 MBytes 716 Mbits/sec 196 sender
[ 5] 0.00-10.04 sec 851 MBytes 711 Mbits/sec receiver
@grote thanks for your reported numbers. This month I plan to further analyze the bottlenecks of qubes-mirage-firewall, and will report back in this issue some graphs and more binaries to test. :) I'm glad that a factor of 2.5 is easily achieved by modern compiler features (that we should enable in future releases of qubes-mirage-firewall) :)
from #151 @palinp (with release 0.8.2)
The two PRs together are ready to merge according to my iperf3 tests. I have now those figures (TCP for 1', the mirage fw cpu is at 100% and the linux cpu (sys-net) is at around 70% (there is plenty of room for improvement there), + UDP for 1', the mirage CPU at 100% and the linux cpu at around 90%, the linux fw baseline is the same as in #130 I just noticed more dropping packet for UDP with linux than with mirage):
[user@fedora qubes-mirage-firewall]$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-60.00 sec 3.57 GBytes 510 Mbits/sec 529 sender
[ 5] 0.00-60.00 sec 3.56 GBytes 510 Mbits/sec receiver
[user@fedora qubes-mirage-firewall]$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -u -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-60.00 sec 4.61 GBytes 660 Mbits/sec 0.000 ms 0/3389750 (0%) sender
[ 5] 0.00-60.00 sec 4.61 GBytes 660 Mbits/sec 0.018 ms 785/3389697 (0.023%) receiver
from IRC (also with 0.8.2):
minor diff between 20220527 no-flambda build (699 Mbps) vs 20221014 with-flambda (729 Mbps)
so, we're on a good track - but of course there's still room for improvements :)
Bump! has this issue been resolved? Has any body found any work arounds?
Dear @ihateprogramming88, we're actively looking into this issue, now that qubes-mirage-firewall has stabilized. It will take some more time and testing to figure out how to improve the performance. :) If you are interested in contributing, let us know.
Dear @hannesm, thanks for your response! I am happy to help :)
I tried to compare what could be the differences between linux and qubes-mirage-fw. What surprised me the most is that linux shows a gigantic bandwidth with TCP and only a slightly better bandwitdh with UDP.
If I shut off TCP Segmentation Offload, the sys-firewall AppVM have bandwidth in the same order of magnitude as with UDP or TCP with mirage.
The TSO would be usefull for similar bandwidth tests where you want to send a huge amount of data between two hosts.
I cannot imagine what could be the amount of work needed to implement TSO into the mirage stack, but I think this would help regarding this issue :)
The following are done on the same laptop (AppVM <-> fw <-> AppVM) with 1 core for the fw, so it counts the internal Xen bandwitdh (both with linux sys-fw between VMs, the first run leave TSO untouched while the second remove TSO from one VM):
$ sudo ethtool -K eth0 tso on
$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 10
Connecting to host 10.137.0.4, port 5201
[ 5] local 10.137.0.21 port 35308 connected to 10.137.0.4 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 368 MBytes 3.08 Gbits/sec 0 1.90 MBytes
[ 5] 1.00-2.00 sec 380 MBytes 3.19 Gbits/sec 0 1.90 MBytes
[ 5] 2.00-3.00 sec 375 MBytes 3.15 Gbits/sec 0 1.90 MBytes
[ 5] 3.00-4.00 sec 369 MBytes 3.09 Gbits/sec 0 1.90 MBytes
[ 5] 4.00-5.00 sec 370 MBytes 3.10 Gbits/sec 0 1.90 MBytes
[ 5] 5.00-6.00 sec 369 MBytes 3.09 Gbits/sec 0 1.90 MBytes
[ 5] 6.00-7.00 sec 374 MBytes 3.14 Gbits/sec 0 1.90 MBytes
[ 5] 7.00-8.00 sec 372 MBytes 3.12 Gbits/sec 0 1.90 MBytes
[ 5] 8.00-9.00 sec 369 MBytes 3.09 Gbits/sec 0 1.90 MBytes
[ 5] 9.00-10.00 sec 372 MBytes 3.12 Gbits/sec 0 1.90 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 3.63 GBytes 3.12 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 3.63 GBytes 3.12 Gbits/sec receiver
iperf Done.
$ sudo ethtool -K eth0 tso off
$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 10
Connecting to host 10.137.0.4, port 5201
[ 5] local 10.137.0.21 port 33160 connected to 10.137.0.4 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 85.6 MBytes 718 Mbits/sec 0 1.09 MBytes
[ 5] 1.00-2.00 sec 85.0 MBytes 713 Mbits/sec 0 1.09 MBytes
[ 5] 2.00-3.00 sec 86.2 MBytes 723 Mbits/sec 0 1.15 MBytes
[ 5] 3.00-4.00 sec 85.0 MBytes 713 Mbits/sec 0 1.15 MBytes
[ 5] 4.00-5.00 sec 83.8 MBytes 703 Mbits/sec 0 1.15 MBytes
[ 5] 5.00-6.00 sec 85.0 MBytes 713 Mbits/sec 0 1.21 MBytes
[ 5] 6.00-7.00 sec 83.8 MBytes 703 Mbits/sec 0 1.21 MBytes
[ 5] 7.00-8.00 sec 83.8 MBytes 703 Mbits/sec 0 1.21 MBytes
[ 5] 8.00-9.00 sec 85.0 MBytes 713 Mbits/sec 0 1.21 MBytes
[ 5] 9.00-10.00 sec 85.0 MBytes 713 Mbits/sec 0 1.21 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 848 MBytes 711 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 846 MBytes 709 Mbits/sec receiver
iperf Done.