webrtc-rs/webrtc

[WebRTC] investigate data-channels-flow-control example throughput performance issue

rainliu opened this issue ยท 13 comments

Pion has more than 500Mbps+,

Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
2021/09/25 13:27:29 OnOpen: data-824638619994. Start sending a series of 1024-byte packets as fast as it can
2021/09/25 13:27:29 OnOpen: data-824636958938. Start receiving data
2021/09/25 13:27:30 Throughput: 570.646 Mbps
2021/09/25 13:27:31 Throughput: 569.753 Mbps
2021/09/25 13:27:32 Throughput: 573.001 Mbps
2021/09/25 13:27:33 Throughput: 572.452 Mbps
2021/09/25 13:27:34 Throughput: 571.297 Mbps
2021/09/25 13:27:35 Throughput: 569.525 Mbps
2021/09/25 13:27:36 Throughput: 567.463 Mbps
...

but, webrtc-rs only has around 13Mbps:

Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 12.990 Mbps
Throughput: 13.698 Mbps
Throughput: 13.559 Mbps
Throughput: 13.345 Mbps
Throughput: 13.565 Mbps
Throughput: 13.582 Mbps

cargo build --release --example data-channels-flow-control increases performance, but not comparable to pion.

./target/release/examples/data-channels-flow-control
Press ctlr-c to stop
Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 175.556 Mbps
Throughput: 106.104 Mbps
Throughput: 76.986 Mbps
Throughput: 61.450 Mbps
Throughput: 51.632 Mbps
Throughput: 44.797 Mbps
Throughput: 39.733 Mbps
Throughput: 35.619 Mbps
Throughput: 32.330 Mbps
Throughput: 29.491 Mbps
Throughput: 43.142 Mbps
Throughput: 48.350 Mbps
Throughput: 46.386 Mbps
Throughput: 44.221 Mbps
Throughput: 48.071 Mbps
Throughput: 55.550 Mbps
Throughput: 53.980 Mbps
Throughput: 52.263 Mbps

whans commented

should be tokio performance limit

some other benchmark

goroutines: 3.22234675s total, 3.222346ms avg per iteration
rust_threads: 16.980509645s total, 16.980509ms avg per iteration
rust_tokio: 9.56997204s total, 9.569972ms avg per iteration
rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration

https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/

@rainliu I made some benchmarks between Go(Pion) and Rust with long running time, maybe will help.


Go(Pion)
From start of benchmark throughput was grow up to 844Mps

17:49:06 Throughput: 721.371 Mbps
17:49:07 Throughput: 727.991 Mbps
17:49:08 Throughput: 743.665 Mbps
...
17:49:39 Throughput: 842.728 Mbps
17:49:40 Throughput: 843.339 Mbps
17:49:41 Throughput: 843.672 Mbps
17:49:42 Throughput: 844.272 Mbps
17:49:43 Throughput: 844.782 Mbps
17:49:44 Throughput: 844.855 Mbps

after has been throwed an exception
mux ERROR: 17:49:45 mux: ending readLoop dispatch error packetio.Buffer is full, discarding write

and throughput started to go slow down without stop to stable point.
I stopped benchmark on

18:44:45 Throughput: 9.966 Mbps

Rust
From start of benchmark throughput was

Throughput: 229.521 Mbps
Throughput: 231.489 Mbps
Throughput: 231.780 Mbps
Throughput: 231.662 Mbps
Throughput: 231.965 Mbps

after that, started to go down and reached lowest point

Throughput: 23.023 Mbps
Throughput: 22.849 Mbps
Throughput: 22.677 Mbps

after started slowly to grow up and balancing between 66.436 Mbps - 41.511 Mbps


CPU/RAM

Go

RAM 22.304 MB
CPU ~1%

Rust

RAM 162MB(stopped at that point, without recycling)
CPU ~120%

should be tokio performance limit

some other benchmark

goroutines: 3.22234675s total, 3.222346ms avg per iteration rust_threads: 16.980509645s total, 16.980509ms avg per iteration rust_tokio: 9.56997204s total, 9.569972ms avg per iteration rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration

https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/

@whans Benchmarks was made on I/O for file, not on socket. Linux have different behavior for files and sockets.
If you will try std's file IO inside async block or task::block_in_place you will have very fast values.
That is because Linux use read-ahead for files.
Also benchmarks was made on /dev/urandom and /dev/null which is in-memory files.

@rainliu I suppose that, here example use socket connection.

whans commented

@vitdevelop thanks
compare std::net::UdpSocket vs tokio::net::UdpSocket
std::UdpSocket is almost twice as fast as tokio::UdpSocket

thanks @vitdevelop and @whans for the benchmarking.

Look like we need some efforts to profile the hotspots/bottlenecks.

whans commented

add tokio console to check the schedule issue

https://github.com/tokio-rs/console

add tokio console to check the schedule issue

https://github.com/tokio-rs/console

@whans thanks for tokio-console, awesome tool

I tried to check with tokio-console busy/idle times for tasks but didn't connect to console_subscriber.
After I figured out that before offer/answer I can connect and I putted some tokio::time::sleep points to see the image,
after offer/answer started to execute, console hangs up.

  • 10 sec before create_oferer
  • 3 sec before create_answerer
  • 3 sec before create_offer and set_remote_description
  • 3 sec before create_answer and set_remote_description

Here is last tokio-console data
webrtc_rs_tokio_console

whans commented

@vitdevelop
you need to slow down the packet sending rate.
add sleep in sending task

@whans Sleep in sending task helped for some time

@rainliu After ~5 min I figured out that a lot of tasks webrtc-sctp-0.3.8/src/timer/[ack,rtx]_timer.rs:[43,153] was spawned, around 500-600.
Most cpu busy time is in that tasks.

I attached the screenshot.
webrtc_rs_ack_rtc_timer

@vitdevelop, thanks for the finding. Could you submit a PR to add tokio-console/console_subscriber to data-channels-flow-control example? so, I can take a look.

whans commented

output about: perf top -p

Xnip2021-10-11_08-19-51