cloud-bulldozer/e2e-benchmarking

router-perf: latency calculation is not correct

qiliRedHat opened this issue · 2 comments

In mb the delay is the second column in the output csv file

start_request(1),delay(2),status(3),written(4),read(5),method_and_url(6),thread_id(7),conn_id(8),conns(9),reqs(10),start(11),socket_writable(12),conn_est(13),tls_reuse(14),err(15)
delay: time in microseconds it took for a full response (e.g. a complete chunk-encoded message) to arrive since ``start_request''. If there was an error before we received the full response, the delay is the time it took to receive the error.

workload.py gets the second element to count the latency

result = subprocess.run(cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
timeout=int(runtime) * 5)
if result.returncode:
print("mb execution error:")
print(result.stderr.decode("utf-8"))
exit(1)
results_csv = csv.reader(open(output))
for hit in results_csv:
if hit[2] not in result_codes:
result_codes[hit[2]] = 0
result_codes[hit[2]] += 1
# Record latency of 'SUCCESS' 200 OK response only
if hit[2] == "200":
latency_list.append(int(hit[1]))
if latency_list:
p95_latency = numpy.percentile(latency_list, 95)
p99_latency = numpy.percentile(latency_list, 99)
avg_latency = numpy.average(latency_list)
return result_codes, p95_latency, p99_latency, avg_latency

But in real test, with Executing 'mb -i /tmp/http-scale-mix.json -d 60 -o /tmp/results.csv', the total time is 60 seconds, but I get many results that the latency, could be avg and p95, are even longer than 60,000 ms
e.g.

11-29 12:40:56.869  �[1mTue Nov 29 04:40:52 UTC 2022 Executing sample 1/2 using termination mix with 1 clients and 1 keepalive requests�[0m
11-29 12:40:56.869  Unable to use a TTY - input is not a terminal or the right kind of file
11-29 12:42:04.459  Executing 'mb -i /tmp/http-scale-mix.json -d 60 -o /tmp/results.csv'
11-29 12:42:04.459  Workload finished, results:
11-29 12:42:04.459  {
11-29 12:42:04.459      "termination": "mix",
11-29 12:42:04.459      "test_type": "mix",
11-29 12:42:04.459      "uuid": "2581f923-af0c-4ef0-a41d-e177c172b906",
11-29 12:42:04.459      "cluster.id": "df292922-446b-48e3-b8ea-034cae269eae",
11-29 12:42:04.459      "cluster.name": "qili-gcp-sdn-bnz5r",
11-29 12:42:04.459      "cluster.ocp_version": "4.12.0-0.nightly-2022-11-28-145121",
11-29 12:42:04.459      "cluster.kubernetes_version": "1.25",
11-29 12:42:04.459      "cluster.sdn": "OpenShiftSDN",
11-29 12:42:04.459      "cluster.platform": "GCP",
11-29 12:42:04.459      "requests_per_second": 12763,
11-29 12:42:04.459      "avg_latency": 156151,
11-29 12:42:04.459      "latency_95pctl": 530888,
11-29 12:42:04.459      "latency_99pctl": 615420,
11-29 12:42:04.459      "host_network": "true",
11-29 12:42:04.459      "sample": "1",
11-29 12:42:04.459      "runtime": 60,
11-29 12:42:04.459      "routes": 2000,
11-29 12:42:04.459      "conn_per_targetroute": 1,
11-29 12:42:04.459      "keepalive": 1,
11-29 12:42:04.459      "tls_reuse": true,
11-29 12:42:04.459      "number_of_routers": "2",
11-29 12:42:04.459      "200": 765836
11-29 12:42:04.459  }

If I read it correctly, delay: time in microseconds they are 615420 microseconds = 0.6s

Oh! sorry I misreading it as millisecond :P
1s =1000 ms/millisecond = 1,000,000 μs/ microseconds