istio/ztunnel

performance: upstream and downstream will never run concurrently

howardjohn opened this issue · 4 comments

copy_bidirectional uses tokio::join for the copy from upstream->downstream and vis-versa. Join is not concurrent, so its impossible for these two to happen at the same time (utilizing threads).

Intuitively it seems like this should be helpful. On a simple call-response workload, no, but if there is continuous flow of data in both directions it should.

I put together a prototype, however, and do not see any benefits:

HBONE
Master
DEST           CLIENT  QPS   CONS  DUR  PAYLOAD  SUCCESS  THROUGHPUT   P50      P90      P99
fortio-server  fortio  0     1     5    0        96743    19348.29qps  0.048ms  0.066ms  0.113ms
fortio-server  fortio  0     1     5    1024     83755    16750.66qps  0.055ms  0.076ms  0.138ms
fortio-server  fortio  2000  1     5    0        9998     1999.53qps   0.094ms  0.143ms  0.277ms
fortio-server  fortio  2000  1     5    1024     10000    1999.89qps   0.106ms  0.156ms  0.289ms
fortio-server  fortio  0     2     5    0        146377   29274.81qps  0.061ms  0.093ms  0.180ms
fortio-server  fortio  0     2     5    1024     131639   26327.32qps  0.071ms  0.096ms  0.176ms
fortio-server  fortio  2000  2     5    0        10000    1999.50qps   0.118ms  0.175ms  0.331ms
fortio-server  fortio  2000  2     5    1024     10000    1999.59qps   0.134ms  0.198ms  0.358ms
fortio-server  fortio  0     4     5    0        222310   44460.78qps  0.085ms  0.117ms  0.192ms
fortio-server  fortio  0     4     5    1024     179137   35826.45qps  0.105ms  0.148ms  0.264ms
fortio-server  fortio  2000  4     5    0        9996     1998.58qps   0.125ms  0.183ms  0.304ms
fortio-server  fortio  2000  4     5    1024     9998     1998.93qps   0.150ms  0.236ms  0.606ms
fortio-server  fortio  0     64    5    0        407907   81573.11qps  0.744ms  1.430ms  1.965ms
fortio-server  fortio  0     64    5    1024     265418   53074.42qps  1.335ms  1.896ms  2.745ms
fortio-server  fortio  2000  64    5    0        9984     1987.19qps   0.142ms  0.200ms  0.296ms
fortio-server  fortio  2000  64    5    1024     9984     1987.05qps   0.177ms  0.258ms  0.385ms
ID   Interval          Transfer      Bitrate
[  0]   0.00..10.00 sec  10.46 GiB   8.99 Gbits/sec        sender
[  0]   0.00..10.00 sec  10.41 GiB   8.94 Gbits/sec        receiver

Spawning
DEST           CLIENT  QPS   CONS  DUR  PAYLOAD  SUCCESS  THROUGHPUT   P50      P90      P99
fortio-server  fortio  0     1     5    0        95146    19028.90qps  0.048ms  0.067ms  0.112ms
fortio-server  fortio  0     1     5    1024     81596    16318.86qps  0.057ms  0.078ms  0.137ms
fortio-server  fortio  2000  1     5    0        9998     1999.40qps   0.094ms  0.139ms  0.250ms
fortio-server  fortio  2000  1     5    1024     9998     1999.54qps   0.107ms  0.154ms  0.271ms
fortio-server  fortio  0     2     5    0        156544   31306.96qps  0.060ms  0.085ms  0.138ms
fortio-server  fortio  0     2     5    1024     128630   25725.32qps  0.073ms  0.099ms  0.186ms
fortio-server  fortio  2000  2     5    0        9998     1999.36qps   0.121ms  0.179ms  0.316ms
fortio-server  fortio  2000  2     5    1024     10000    1999.42qps   0.135ms  0.200ms  0.371ms
fortio-server  fortio  0     4     5    0        218797   43758.22qps  0.086ms  0.119ms  0.196ms
fortio-server  fortio  0     4     5    1024     182502   36499.22qps  0.102ms  0.145ms  0.268ms
fortio-server  fortio  2000  4     5    0        9998     1998.75qps   0.137ms  0.235ms  0.479ms
fortio-server  fortio  2000  4     5    1024     9998     1998.85qps   0.164ms  0.287ms  0.631ms
fortio-server  fortio  0     64    5    0        400404   80069.12qps  0.755ms  1.469ms  1.973ms
fortio-server  fortio  0     64    5    1024     286835   57358.42qps  1.231ms  1.860ms  2.070ms
fortio-server  fortio  2000  64    5    0        9984     1987.07qps   0.144ms  0.212ms  0.364ms
fortio-server  fortio  2000  64    5    1024     9984     1986.99qps   0.180ms  0.258ms  0.411ms
[  0]   0.00..10.00 sec  10.77 GiB   9.25 Gbits/sec        sender
[  0]   0.00..10.00 sec  10.71 GiB   9.20 Gbits/sec        receiver



TCP
Master
DEST           CLIENT  QPS   CONS  DUR  PAYLOAD  SUCCESS  THROUGHPUT    P50      P90      P99
fortio-server  fortio  0     1     3    0        93706    31234.58qps   0.029ms  0.040ms  0.075ms
fortio-server  fortio  0     1     3    64000    18341    6113.28qps    0.138ms  0.267ms  0.395ms
fortio-server  fortio  2000  1     3    0        5998     1998.92qps    0.062ms  0.103ms  0.245ms
fortio-server  fortio  2000  1     3    64000    5998     1999.05qps    0.181ms  0.364ms  0.595ms
fortio-server  fortio  0     64    3    0        368849   122926.89qps  0.507ms  0.682ms  1.397ms
fortio-server  fortio  0     64    3    64000    43652    14530.98qps   3.361ms  9.953ms  19.560ms
fortio-server  fortio  2000  64    3    0        5952     1978.43qps    0.098ms  0.144ms  0.316ms
fortio-server  fortio  2000  64    3    64000    5952     1978.35qps    0.284ms  0.645ms  1.627ms

[SUM]   0.00..10.00 sec  25.64 GiB   22.02 Gbits/sec        sender
[SUM]   0.00..9.97 sec  27.77 GiB   23.93 Gbits/sec        receiver

Spawning
DEST           CLIENT  QPS   CONS  DUR  PAYLOAD  SUCCESS  THROUGHPUT    P50      P90      P99
fortio-server  fortio  0     1     3    0        86532    28843.36qps   0.031ms  0.044ms  0.094ms
fortio-server  fortio  0     1     3    64000    14891    4963.41qps    0.164ms  0.330ms  0.640ms
fortio-server  fortio  2000  1     3    0        5998     1998.99qps    0.064ms  0.106ms  0.196ms
fortio-server  fortio  2000  1     3    64000    6000     1999.32qps    0.185ms  0.376ms  0.678ms
fortio-server  fortio  0     64    3    0        360397   120090.70qps  0.508ms  0.698ms  1.640ms
fortio-server  fortio  0     64    3    64000    44472    14802.99qps   3.186ms  9.915ms  19.528ms
fortio-server  fortio  2000  64    3    0        5952     1978.83qps    0.093ms  0.130ms  0.192ms
fortio-server  fortio  2000  64    3    64000    5952     1978.39qps    0.273ms  0.568ms  0.919ms

[SUM]   0.00..10.00 sec  25.04 GiB   21.51 Gbits/sec        sender
[SUM]   0.00..9.96 sec  28.12 GiB   24.25 Gbits/sec        receiver

We should investigate more

Did you push the prototype code?

It shouldn't be terribly hard to use spawn and join on the handles instead, or whatever, I guess, if we want real parallelism.

A little more expensive for trivial cases, but probably worth it for all the others.

I think we can collect multiple handles from tokio::spawn and then await them. I presume something like that is what @howardjohn already tried but didn't see any benefit from.

I think we can collect multiple handles from tokio::spawn and then await them. I presume something like that is what @howardjohn already tried but didn't see any benefit from.

Yeah - I misread. It probably doesn't make much difference because we already spawn the per-workload handler in a thread and are not remotely CPU bound even under load - distributing this specific operation across threads won't help much (and might make it easier for a greedy workload to starve other workloads on the node).

in general sticking with a one-thread-per-conn-handler-instance model seems best.