performance: upstream and downstream will never run concurrently
howardjohn opened this issue · 4 comments
copy_bidirectional
uses tokio::join
for the copy from upstream->downstream and vis-versa. Join is not concurrent, so its impossible for these two to happen at the same time (utilizing threads).
Intuitively it seems like this should be helpful. On a simple call-response workload, no, but if there is continuous flow of data in both directions it should.
I put together a prototype, however, and do not see any benefits:
HBONE
Master
DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99
fortio-server fortio 0 1 5 0 96743 19348.29qps 0.048ms 0.066ms 0.113ms
fortio-server fortio 0 1 5 1024 83755 16750.66qps 0.055ms 0.076ms 0.138ms
fortio-server fortio 2000 1 5 0 9998 1999.53qps 0.094ms 0.143ms 0.277ms
fortio-server fortio 2000 1 5 1024 10000 1999.89qps 0.106ms 0.156ms 0.289ms
fortio-server fortio 0 2 5 0 146377 29274.81qps 0.061ms 0.093ms 0.180ms
fortio-server fortio 0 2 5 1024 131639 26327.32qps 0.071ms 0.096ms 0.176ms
fortio-server fortio 2000 2 5 0 10000 1999.50qps 0.118ms 0.175ms 0.331ms
fortio-server fortio 2000 2 5 1024 10000 1999.59qps 0.134ms 0.198ms 0.358ms
fortio-server fortio 0 4 5 0 222310 44460.78qps 0.085ms 0.117ms 0.192ms
fortio-server fortio 0 4 5 1024 179137 35826.45qps 0.105ms 0.148ms 0.264ms
fortio-server fortio 2000 4 5 0 9996 1998.58qps 0.125ms 0.183ms 0.304ms
fortio-server fortio 2000 4 5 1024 9998 1998.93qps 0.150ms 0.236ms 0.606ms
fortio-server fortio 0 64 5 0 407907 81573.11qps 0.744ms 1.430ms 1.965ms
fortio-server fortio 0 64 5 1024 265418 53074.42qps 1.335ms 1.896ms 2.745ms
fortio-server fortio 2000 64 5 0 9984 1987.19qps 0.142ms 0.200ms 0.296ms
fortio-server fortio 2000 64 5 1024 9984 1987.05qps 0.177ms 0.258ms 0.385ms
ID Interval Transfer Bitrate
[ 0] 0.00..10.00 sec 10.46 GiB 8.99 Gbits/sec sender
[ 0] 0.00..10.00 sec 10.41 GiB 8.94 Gbits/sec receiver
Spawning
DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99
fortio-server fortio 0 1 5 0 95146 19028.90qps 0.048ms 0.067ms 0.112ms
fortio-server fortio 0 1 5 1024 81596 16318.86qps 0.057ms 0.078ms 0.137ms
fortio-server fortio 2000 1 5 0 9998 1999.40qps 0.094ms 0.139ms 0.250ms
fortio-server fortio 2000 1 5 1024 9998 1999.54qps 0.107ms 0.154ms 0.271ms
fortio-server fortio 0 2 5 0 156544 31306.96qps 0.060ms 0.085ms 0.138ms
fortio-server fortio 0 2 5 1024 128630 25725.32qps 0.073ms 0.099ms 0.186ms
fortio-server fortio 2000 2 5 0 9998 1999.36qps 0.121ms 0.179ms 0.316ms
fortio-server fortio 2000 2 5 1024 10000 1999.42qps 0.135ms 0.200ms 0.371ms
fortio-server fortio 0 4 5 0 218797 43758.22qps 0.086ms 0.119ms 0.196ms
fortio-server fortio 0 4 5 1024 182502 36499.22qps 0.102ms 0.145ms 0.268ms
fortio-server fortio 2000 4 5 0 9998 1998.75qps 0.137ms 0.235ms 0.479ms
fortio-server fortio 2000 4 5 1024 9998 1998.85qps 0.164ms 0.287ms 0.631ms
fortio-server fortio 0 64 5 0 400404 80069.12qps 0.755ms 1.469ms 1.973ms
fortio-server fortio 0 64 5 1024 286835 57358.42qps 1.231ms 1.860ms 2.070ms
fortio-server fortio 2000 64 5 0 9984 1987.07qps 0.144ms 0.212ms 0.364ms
fortio-server fortio 2000 64 5 1024 9984 1986.99qps 0.180ms 0.258ms 0.411ms
[ 0] 0.00..10.00 sec 10.77 GiB 9.25 Gbits/sec sender
[ 0] 0.00..10.00 sec 10.71 GiB 9.20 Gbits/sec receiver
TCP
Master
DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99
fortio-server fortio 0 1 3 0 93706 31234.58qps 0.029ms 0.040ms 0.075ms
fortio-server fortio 0 1 3 64000 18341 6113.28qps 0.138ms 0.267ms 0.395ms
fortio-server fortio 2000 1 3 0 5998 1998.92qps 0.062ms 0.103ms 0.245ms
fortio-server fortio 2000 1 3 64000 5998 1999.05qps 0.181ms 0.364ms 0.595ms
fortio-server fortio 0 64 3 0 368849 122926.89qps 0.507ms 0.682ms 1.397ms
fortio-server fortio 0 64 3 64000 43652 14530.98qps 3.361ms 9.953ms 19.560ms
fortio-server fortio 2000 64 3 0 5952 1978.43qps 0.098ms 0.144ms 0.316ms
fortio-server fortio 2000 64 3 64000 5952 1978.35qps 0.284ms 0.645ms 1.627ms
[SUM] 0.00..10.00 sec 25.64 GiB 22.02 Gbits/sec sender
[SUM] 0.00..9.97 sec 27.77 GiB 23.93 Gbits/sec receiver
Spawning
DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99
fortio-server fortio 0 1 3 0 86532 28843.36qps 0.031ms 0.044ms 0.094ms
fortio-server fortio 0 1 3 64000 14891 4963.41qps 0.164ms 0.330ms 0.640ms
fortio-server fortio 2000 1 3 0 5998 1998.99qps 0.064ms 0.106ms 0.196ms
fortio-server fortio 2000 1 3 64000 6000 1999.32qps 0.185ms 0.376ms 0.678ms
fortio-server fortio 0 64 3 0 360397 120090.70qps 0.508ms 0.698ms 1.640ms
fortio-server fortio 0 64 3 64000 44472 14802.99qps 3.186ms 9.915ms 19.528ms
fortio-server fortio 2000 64 3 0 5952 1978.83qps 0.093ms 0.130ms 0.192ms
fortio-server fortio 2000 64 3 64000 5952 1978.39qps 0.273ms 0.568ms 0.919ms
[SUM] 0.00..10.00 sec 25.04 GiB 21.51 Gbits/sec sender
[SUM] 0.00..9.96 sec 28.12 GiB 24.25 Gbits/sec receiver
We should investigate more
Did you push the prototype code?
It shouldn't be terribly hard to use spawn
and join on the handles instead, or whatever, I guess, if we want real parallelism.
A little more expensive for trivial cases, but probably worth it for all the others.
I think we can collect multiple handles from tokio::spawn and then await them. I presume something like that is what @howardjohn already tried but didn't see any benefit from.
I think we can collect multiple handles from tokio::spawn and then await them. I presume something like that is what @howardjohn already tried but didn't see any benefit from.
Yeah - I misread. It probably doesn't make much difference because we already spawn the per-workload handler in a thread and are not remotely CPU bound even under load - distributing this specific operation across threads won't help much (and might make it easier for a greedy workload to starve other workloads on the node).
in general sticking with a one-thread-per-conn-handler-instance model seems best.