mirage/qubes-mirage-firewall

Slower bandwidth compared to sys-firewall

grote opened this issue · 13 comments

grote commented

I am debugging why I don't get my full 1Gbps bandwidth on Qubes OS which I (almost) get when booting Ubuntu from a USB flash drive. During this I noticed that using mirage-firewall provides worse performance than Qubes' default firewall.

when using mirage-firewall:

Screenshot_2020-12-04 Speedtest by Ookla - The Global Broadband Speed Test

when using sys-firewall:

Screenshot_2020-12-04 Speedtest by Ookla - The Global Broadband Speed Test

Could it be that mirage-firewall has bandwidth limitations?

What about the bandwidth on sys-net? Better to take iperf3

grote commented

Alright, so I set up iperf3 on a machine in the local network connected via 1Gbps ethernet running in server mode. Then I ran three client tests against it and the results are similar:

sys-net:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   963 MBytes   808 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   961 MBytes   804 Mbits/sec                  receiver

sys-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   709 MBytes   595 Mbits/sec    5             sender
[  5]   0.00-10.04  sec   706 MBytes   590 Mbits/sec                  receiver

mirage-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   207 MBytes   174 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   205 MBytes   172 Mbits/sec                  receiver

It is a shame when you have a 1Gbps fiber link and can't fully utilize it :(

grote commented

If I turn on Scatter Gather as suggested in QubesOS/qubes-issues#3510 I get:

with sys-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   997 MBytes   836 Mbits/sec  253             sender
[  5]   0.00-10.04  sec   993 MBytes   830 Mbits/sec                  receiver

with mirage-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   203 MBytes   171 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   201 MBytes   168 Mbits/sec                  receiver

So this seems to fix the issue with sys-firewall (but note the number of retries), but mirage-firewall has actually slightly worse performance.

hey, so my experience with MirageOS unikernels is that the OCaml optimizer "flambda" helps to a big degree. I have not tested this with the QubesOS firewall, but would you mind to either

  • manually create a fresh opam switch opam sw create 4.11.1+flambda and compile the firewall there
  • if using Docker use the ocaml/opam:debian-10-ocaml-4.11-flambda container (or instead of debian 10 whichever distribution you prefer)

The resulting unikernel should be semantically equivalent, but allocating much less memory and thus more performant.

Another very suitable optimization is to use the best-fit allocation policy by passing --allocation-policy=best-fit to the unikernel -- either at the configuration stage (mirage configure -t xen --allocation-policy=best-fit) or at runtime as boot arguments (qubes... kernelopts '--allocation-policy=best-fit`).

I'd be very interested to see the number matrix of: baseline (mirage-qubes-firewall, as above); best-fit; flambda; flambda + best-fit.

Thanks for your report including figures for comparison.

NB: I scheduled (and finished) the builds above, please have a try with https://data.robur.coop/qubes-firewall-flambda/2020-12-05/ with the expectation to be the fast one (the best-fit allocation policy was already enabled at configuration time)

and https://data.robur.coop/qubes-firewall/2020-12-05/ for a unikernel with the "standard" OCaml compiler, but best-fit enabled at cofiguration time.

Espectially the first one (with flambda) would be interesting to see its performace numbers on your hardware.

Thanks, hannes

grote commented

last release

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   237 MBytes   199 Mbits/sec   20             sender
[  5]   0.00-10.04  sec   236 MBytes   197 Mbits/sec                  receiver

best-fit enabled

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   523 MBytes   438 Mbits/sec   99             sender
[  5]   0.00-10.04  sec   520 MBytes   435 Mbits/sec                  receiver

flambda

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   531 MBytes   446 Mbits/sec  151             sender
[  5]   0.00-10.04  sec   529 MBytes   443 Mbits/sec                  receiver

For comparison again what sys-firewall is giving me:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   854 MBytes   716 Mbits/sec  196             sender
[  5]   0.00-10.04  sec   851 MBytes   711 Mbits/sec                  receiver

@grote thanks for your reported numbers. This month I plan to further analyze the bottlenecks of qubes-mirage-firewall, and will report back in this issue some graphs and more binaries to test. :) I'm glad that a factor of 2.5 is easily achieved by modern compiler features (that we should enable in future releases of qubes-mirage-firewall) :)

from #151 @palinp (with release 0.8.2)

The two PRs together are ready to merge according to my iperf3 tests. I have now those figures (TCP for 1', the mirage fw cpu is at 100% and the linux cpu (sys-net) is at around 70% (there is plenty of room for improvement there), + UDP for 1', the mirage CPU at 100% and the linux cpu at around 90%, the linux fw baseline is the same as in #130 I just noticed more dropping packet for UDP with linux than with mirage):

[user@fedora qubes-mirage-firewall]$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  3.57 GBytes   510 Mbits/sec  529             sender
[  5]   0.00-60.00  sec  3.56 GBytes   510 Mbits/sec                  receiver

[user@fedora qubes-mirage-firewall]$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -u -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-60.00  sec  4.61 GBytes   660 Mbits/sec  0.000 ms  0/3389750 (0%)  sender
[  5]   0.00-60.00  sec  4.61 GBytes   660 Mbits/sec  0.018 ms  785/3389697 (0.023%)  receiver

from IRC (also with 0.8.2):
minor diff between 20220527 no-flambda build (699 Mbps) vs 20221014 with-flambda (729 Mbps)

so, we're on a good track - but of course there's still room for improvements :)

Bump! has this issue been resolved? Has any body found any work arounds?

Dear @ihateprogramming88, we're actively looking into this issue, now that qubes-mirage-firewall has stabilized. It will take some more time and testing to figure out how to improve the performance. :) If you are interested in contributing, let us know.

Dear @hannesm, thanks for your response! I am happy to help :)

I tried to compare what could be the differences between linux and qubes-mirage-fw. What surprised me the most is that linux shows a gigantic bandwidth with TCP and only a slightly better bandwitdh with UDP.

If I shut off TCP Segmentation Offload, the sys-firewall AppVM have bandwidth in the same order of magnitude as with UDP or TCP with mirage.
The TSO would be usefull for similar bandwidth tests where you want to send a huge amount of data between two hosts.
I cannot imagine what could be the amount of work needed to implement TSO into the mirage stack, but I think this would help regarding this issue :)

The following are done on the same laptop (AppVM <-> fw <-> AppVM) with 1 core for the fw, so it counts the internal Xen bandwitdh (both with linux sys-fw between VMs, the first run leave TSO untouched while the second remove TSO from one VM):

$ sudo ethtool -K eth0 tso on
$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 10
Connecting to host 10.137.0.4, port 5201
[  5] local 10.137.0.21 port 35308 connected to 10.137.0.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   368 MBytes  3.08 Gbits/sec    0   1.90 MBytes       
[  5]   1.00-2.00   sec   380 MBytes  3.19 Gbits/sec    0   1.90 MBytes       
[  5]   2.00-3.00   sec   375 MBytes  3.15 Gbits/sec    0   1.90 MBytes       
[  5]   3.00-4.00   sec   369 MBytes  3.09 Gbits/sec    0   1.90 MBytes       
[  5]   4.00-5.00   sec   370 MBytes  3.10 Gbits/sec    0   1.90 MBytes       
[  5]   5.00-6.00   sec   369 MBytes  3.09 Gbits/sec    0   1.90 MBytes       
[  5]   6.00-7.00   sec   374 MBytes  3.14 Gbits/sec    0   1.90 MBytes       
[  5]   7.00-8.00   sec   372 MBytes  3.12 Gbits/sec    0   1.90 MBytes       
[  5]   8.00-9.00   sec   369 MBytes  3.09 Gbits/sec    0   1.90 MBytes       
[  5]   9.00-10.00  sec   372 MBytes  3.12 Gbits/sec    0   1.90 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.63 GBytes  3.12 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  3.63 GBytes  3.12 Gbits/sec                  receiver

iperf Done.
$ sudo ethtool -K eth0 tso off
$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 10
Connecting to host 10.137.0.4, port 5201
[  5] local 10.137.0.21 port 33160 connected to 10.137.0.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  85.6 MBytes   718 Mbits/sec    0   1.09 MBytes       
[  5]   1.00-2.00   sec  85.0 MBytes   713 Mbits/sec    0   1.09 MBytes       
[  5]   2.00-3.00   sec  86.2 MBytes   723 Mbits/sec    0   1.15 MBytes       
[  5]   3.00-4.00   sec  85.0 MBytes   713 Mbits/sec    0   1.15 MBytes       
[  5]   4.00-5.00   sec  83.8 MBytes   703 Mbits/sec    0   1.15 MBytes       
[  5]   5.00-6.00   sec  85.0 MBytes   713 Mbits/sec    0   1.21 MBytes       
[  5]   6.00-7.00   sec  83.8 MBytes   703 Mbits/sec    0   1.21 MBytes       
[  5]   7.00-8.00   sec  83.8 MBytes   703 Mbits/sec    0   1.21 MBytes       
[  5]   8.00-9.00   sec  85.0 MBytes   713 Mbits/sec    0   1.21 MBytes       
[  5]   9.00-10.00  sec  85.0 MBytes   713 Mbits/sec    0   1.21 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   848 MBytes   711 Mbits/sec    0             sender
[  5]   0.00-10.01  sec   846 MBytes   709 Mbits/sec                  receiver

iperf Done.