webrtc-rs/rtp

Benchmark RTP

Closed this issue · 7 comments

benchmark RTP crate and compare its performance to Pion RTP or other implementations

@metaclips, this task is assigned to you since you are working on performance improvements with Bytes crate.

  • Using Reader/Write Trait (v0.1.0)

Benchmarking Benchmark Marshal
Benchmarking Benchmark Marshal: Warming up for 3.0000 s
Benchmarking Benchmark Marshal: Collecting 100 samples in estimated 5.0001 s (21M iterations)
Benchmarking Benchmark Marshal: Analyzing
Benchmark Marshal time: [236.13 ns 239.02 ns 242.54 ns]
change: [+5.7586% +7.0876% +8.6679%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

Benchmarking Benchmark Unmarshal
Benchmarking Benchmark Unmarshal : Warming up for 3.0000 s
Benchmarking Benchmark Unmarshal : Collecting 100 samples in estimated 5.0016 s (6.3M iterations)
Benchmarking Benchmark Unmarshal : Analyzing
Benchmark Unmarshal time: [790.74 ns 791.85 ns 792.98 ns]
change: [+5.4865% +5.8001% +6.1335%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

  • Using std slice (v0.1.1)

Benchmarking Marshal Benchmark
Benchmarking Marshal Benchmark: Warming up for 3.0000 s
Benchmarking Marshal Benchmark: Collecting 100 samples in estimated 5.0005 s (52M iterations)
Benchmarking Marshal Benchmark: Analyzing
Marshal Benchmark time: [96.787 ns 97.109 ns 97.384 ns]
change: [+1.9773% +2.2890% +2.5474%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
5 (5.00%) low mild
2 (2.00%) high mild

Benchmarking Marshal_To Benchmark
Benchmarking Marshal_To Benchmark: Warming up for 3.0000 s
Benchmarking Marshal_To Benchmark: Collecting 100 samples in estimated 5.0001 s (112M iterations)
Benchmarking Marshal_To Benchmark: Analyzing
Marshal_To Benchmark time: [45.694 ns 45.734 ns 45.771 ns]
change: [-10.316% -8.7426% -7.4155%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high mild

Benchmarking Shared Struct
Benchmarking Shared Struct: Warming up for 3.0000 s
Benchmarking Shared Struct: Collecting 100 samples in estimated 5.0005 s (22M iterations)
Benchmarking Shared Struct: Analyzing
Shared Struct time: [216.52 ns 217.21 ns 218.01 ns]
change: [-8.8653% -2.0314% +5.5339%] (p = 0.66 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

Benchmarking New Struct
Benchmarking New Struct: Warming up for 3.0000 s
Benchmarking New Struct: Collecting 100 samples in estimated 5.0010 s (22M iterations)
Benchmarking New Struct: Analyzing
New Struct time: [201.05 ns 201.44 ns 201.84 ns]
change: [+3.9420% +4.3199% +4.7129%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe

  • Using Bytes crates (branch v0.2.0_bytes)

Benchmarking Benchmark Marshal
Benchmarking Benchmark Marshal: Warming up for 3.0000 s
Benchmarking Benchmark Marshal: Collecting 100 samples in estimated 5.0001 s (28M iterations)
Benchmarking Benchmark Marshal: Analyzing
Benchmark Marshal time: [183.42 ns 184.41 ns 185.41 ns]
change: [-3.0654% -2.5110% -1.9095%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

Benchmarking Benchmark Unmarshal
Benchmarking Benchmark Unmarshal : Warming up for 3.0000 s
Benchmarking Benchmark Unmarshal : Collecting 100 samples in estimated 5.0001 s (35M iterations)
Benchmarking Benchmark Unmarshal : Analyzing
Benchmark Unmarshal time: [141.36 ns 141.81 ns 142.33 ns]
change: [-2.2556% -1.9829% -1.6813%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

From the benchmark results, I think, we should go ahead to use Bytes crate with additional optimization efforts on Marshal.

Bytes for Unmarshal can achieve Zero-Copy. Hopefully we can achieve the same Zero-Copy for Marshal.

cc @metaclips

The bytes results come from branch https://github.com/webrtc-rs/rtp/tree/v0.2.0_bytes

Hello, @rainliu this is the benchmark result between byte crate and std type

std

Marshal Benchmark       time:   [90.550 ns 91.886 ns 93.577 ns]                              
                        change: [+1.3891% +2.6578% +4.0469%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Marshal_To Benchmark    time:   [24.746 ns 24.929 ns 25.169 ns]                                  
                        change: [-6.9286% -3.5316% -0.0573%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

Shared Struct           time:   [294.06 ns 298.22 ns 303.17 ns]                          
                        change: [-24.726% +6.7566% +50.712%] (p = 0.79 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

New Struct              time:   [529.49 ns 533.24 ns 537.31 ns]                        
                        change: [+5.3625% +6.8160% +8.3214%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

bytes

Gnuplot not found, using plotters backend
Marshal Benchmark       time:   [100.12 ns 100.78 ns 101.46 ns]                              
                        change: [+9.8439% +11.545% +13.149%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

Marshal_To Benchmark    time:   [24.042 ns 24.320 ns 24.591 ns]                                  
                        change: [-6.7403% -4.6760% -2.7840%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Shared Struct           time:   [287.05 ns 292.40 ns 299.72 ns]                          
                        change: [-29.391% +1.2170% +46.805%] (p = 0.89 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

New Struct              time:   [520.20 ns 525.23 ns 530.76 ns]                        
                        change: [-2.2467% -1.0919% +0.1707%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

I'll look further into this. Note: std implementation has not yet been optimised.

The bytes results come from branch https://github.com/webrtc-rs/rtp/tree/v0.2.0_bytes

Noted, will look into it

with BytesMut::with_capacity, it greatly improves Marshal preformance.

Gnuplot not found, using plotters backend
Benchmarking Benchmark Marshal
Benchmarking Benchmark Marshal: Warming up for 3.0000 s
Benchmarking Benchmark Marshal: Collecting 100 samples in estimated 5.0001 s (50M iterations)
Benchmarking Benchmark Marshal: Analyzing
Benchmark Marshal time: [95.906 ns 96.506 ns 97.114 ns]
change: [-16.478% -12.922% -9.3894%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe

Benchmarking Benchmark Unmarshal
Benchmarking Benchmark Unmarshal : Warming up for 3.0000 s
Benchmarking Benchmark Unmarshal : Collecting 100 samples in estimated 5.0002 s (30M iterations)
Benchmarking Benchmark Unmarshal : Analyzing
Benchmark Unmarshal time: [156.56 ns 158.23 ns 160.03 ns]
change: [-26.199% -23.574% -20.876%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

Process finished with exit code 0