[WebRTC] investigate data-channels-flow-control example throughput performance issue
rainliu opened this issue ยท 13 comments
Pion has more than 500Mbps+,
Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
2021/09/25 13:27:29 OnOpen: data-824638619994. Start sending a series of 1024-byte packets as fast as it can
2021/09/25 13:27:29 OnOpen: data-824636958938. Start receiving data
2021/09/25 13:27:30 Throughput: 570.646 Mbps
2021/09/25 13:27:31 Throughput: 569.753 Mbps
2021/09/25 13:27:32 Throughput: 573.001 Mbps
2021/09/25 13:27:33 Throughput: 572.452 Mbps
2021/09/25 13:27:34 Throughput: 571.297 Mbps
2021/09/25 13:27:35 Throughput: 569.525 Mbps
2021/09/25 13:27:36 Throughput: 567.463 Mbps
...
but, webrtc-rs only has around 13Mbps:
Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 12.990 Mbps
Throughput: 13.698 Mbps
Throughput: 13.559 Mbps
Throughput: 13.345 Mbps
Throughput: 13.565 Mbps
Throughput: 13.582 Mbps
cargo build --release --example data-channels-flow-control increases performance, but not comparable to pion.
./target/release/examples/data-channels-flow-control
Press ctlr-c to stop
Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 175.556 Mbps
Throughput: 106.104 Mbps
Throughput: 76.986 Mbps
Throughput: 61.450 Mbps
Throughput: 51.632 Mbps
Throughput: 44.797 Mbps
Throughput: 39.733 Mbps
Throughput: 35.619 Mbps
Throughput: 32.330 Mbps
Throughput: 29.491 Mbps
Throughput: 43.142 Mbps
Throughput: 48.350 Mbps
Throughput: 46.386 Mbps
Throughput: 44.221 Mbps
Throughput: 48.071 Mbps
Throughput: 55.550 Mbps
Throughput: 53.980 Mbps
Throughput: 52.263 Mbps
should be tokio performance limit
some other benchmark
goroutines: 3.22234675s total, 3.222346ms avg per iteration
rust_threads: 16.980509645s total, 16.980509ms avg per iteration
rust_tokio: 9.56997204s total, 9.569972ms avg per iteration
rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration
https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/
@rainliu I made some benchmarks between Go(Pion) and Rust with long running time, maybe will help.
Go(Pion)
From start of benchmark throughput was grow up to 844Mps
17:49:06 Throughput: 721.371 Mbps
17:49:07 Throughput: 727.991 Mbps
17:49:08 Throughput: 743.665 Mbps
...
17:49:39 Throughput: 842.728 Mbps
17:49:40 Throughput: 843.339 Mbps
17:49:41 Throughput: 843.672 Mbps
17:49:42 Throughput: 844.272 Mbps
17:49:43 Throughput: 844.782 Mbps
17:49:44 Throughput: 844.855 Mbps
after has been throwed an exception
mux ERROR: 17:49:45 mux: ending readLoop dispatch error packetio.Buffer is full, discarding write
and throughput started to go slow down without stop to stable point.
I stopped benchmark on
18:44:45 Throughput: 9.966 Mbps
Rust
From start of benchmark throughput was
Throughput: 229.521 Mbps
Throughput: 231.489 Mbps
Throughput: 231.780 Mbps
Throughput: 231.662 Mbps
Throughput: 231.965 Mbps
after that, started to go down and reached lowest point
Throughput: 23.023 Mbps
Throughput: 22.849 Mbps
Throughput: 22.677 Mbps
after started slowly to grow up and balancing between 66.436 Mbps - 41.511 Mbps
CPU/RAM
Go
RAM 22.304 MB
CPU ~1%
Rust
RAM 162MB(stopped at that point, without recycling)
CPU ~120%
should be tokio performance limit
some other benchmark
goroutines: 3.22234675s total, 3.222346ms avg per iteration rust_threads: 16.980509645s total, 16.980509ms avg per iteration rust_tokio: 9.56997204s total, 9.569972ms avg per iteration rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration
https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/
@whans Benchmarks was made on I/O for file, not on socket. Linux have different behavior for files and sockets.
If you will try std's file IO inside async
block or task::block_in_place
you will have very fast values.
That is because Linux use read-ahead for files.
Also benchmarks was made on /dev/urandom
and /dev/null
which is in-memory files.
@rainliu I suppose that, here example use socket connection.
@vitdevelop thanks
compare std::net::UdpSocket vs tokio::net::UdpSocket
std::UdpSocket is almost twice as fast as tokio::UdpSocket
thanks @vitdevelop and @whans for the benchmarking.
Look like we need some efforts to profile the hotspots/bottlenecks.
add tokio console to check the schedule issue
add tokio console to check the schedule issue
@whans thanks for tokio-console
, awesome tool
I tried to check with tokio-console
busy/idle times for tasks but didn't connect to console_subscriber
.
After I figured out that before offer/answer I can connect and I putted some tokio::time::sleep
points to see the image,
after offer/answer started to execute, console hangs up.
- 10 sec before
create_oferer
- 3 sec before
create_answerer
- 3 sec before
create_offer
andset_remote_description
- 3 sec before
create_answer
andset_remote_description
@vitdevelop
you need to slow down the packet sending rate.
add sleep in sending task
@vitdevelop, thanks for the finding. Could you submit a PR to add tokio-console/console_subscriber to data-channels-flow-control example? so, I can take a look.
@rainliu Added PR
webrtc-rs/examples#1