cisco-system-traffic-generator/trex-core

The latencies measurements are vary between different test runs on E810.

MaciejWachowski opened this issue · 3 comments

Short desription: The latencies are stable within a test but vary between different test runs.

I prepared automated script to measure latency in my environment on E810 card.
I am sampling average latency from all streams (2) every 0.1 sek.
I am gathering avg latency using stl api (stats() function)
I did 5 runs of the same traffic profile with the same parameters (BW, packet size etc.)
I noticed that latency is stable within test run, for example:

[
 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0,
 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0,
 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0,
 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0,
 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0,
 ....
 ]

But it is different between test runs:
(I did measurements for different BW, packet sizes and duration (from 2 minutes to 60 minutes))

latency _ours -> average latency from all samples
latency_avg_calculated -> average latency reported by trex at the end of test run
as you can see latency _ours and latency_avg_calculated are almost identical and it is expected

RUN 1:
latency_ours	latency_avg_calculated
21.08529412	20.5

RUN 2:
latency_ours	latency_avg_calculated
74.55084034	76.5

RUN 3:
latency_ours	latency_avg_calculated
69.08067227	70.5

RUN 4:
latency_ours	latency_avg_calculated
69.08067227	70.5

RUN 5:
latency_ours	latency_avg_calculated
18.75672269	21.5

Suspected root cause - random behaviour of RSS hash functions, but unfortunately I am unable to change hash funcion using ethtool.
Do you have any other idea how can we sabilize latency between runs or if is it possible to change hash function using TRex?

hhaim commented

@MaciejWachowski this is expected in software mode and not related to E810.
without software mode (E810 does not support it yet) there should not be an issue.
Could check it with different NIC?

@hhaim Ok, but we have similar differences on series 700 cards (without enabling software mode).
X710:

Run_1:
size	avg_2_ports	latency_ours
1280	153.5	158.105042
1400	148	175.610084
1518	150.5	180.9789916

Run_2:
size	avg_2_ports	latency_ours
1280	166.5	160.9256303
1400	139.5	141.0726891
1518	171	170.6436975

Run_3:
size	avg_2_ports	latency_ours
1280	171	174.4436975
1400	167	185.4630252
1518	199	198.1823529

Run_4:
size	avg_2_ports	latency_ours
1280	176.5	177.3218487
1400	193	195.8428571
1518	211.5	212.147479

XXV710:

Run_1:
size	avg_2_ports	latency_ours
1280	6.5	6.952941176
1400	7	6.378571429
1518	7	6.976470588

Run_2:
size	avg_2_ports	latency_ours
1280	15.5	15.58067227
1400	34.5	35.86302521
1518	18.5	18.43823529

Run_3:
size	avg_2_ports	latency_ours
1280	8	7.993277311
1400	12.5	12.01386555
1518	8.5	8.910084034

Run_4:
size	avg_2_ports	latency_ours
1280	7	6.988235294
1400	6.5	6.521008403
1518	7	6.992016807
hhaim commented

@hhaim in XL710 and X710 the hardware latency should be ~5-10usec I would look into the HDR histogram to understand the problem more (instead of latency_ours and average)