【Advice】Providing performance testing documents and data
tanjunchen opened this issue · 3 comments
May I ask what is the additional network latency for L4 and L7 caused by coroot-node-agent?
What is the impact of eBPF application topology, trace, etc. on the application?
Can official documents provide performance pressure test data?
After our testing, the coroot-node-agent network latency p90 increased by 6460us and QPS decreased by 50%, as shown in the flame diagram below.
Why is the value size of #define MAX_PAYLOAD_SIZE 1024? Is it quite time-consuming to process the logic of L7 here, and why is it 1024 bytes?
the process of performance test:
- deploy coroot according to the document of coroot website.
➜ ebpf-performance kubectl -n coroot get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coroot-68d887b548-4fhkn 1/1 Running 0 16d 10.2.2.10 192.168.1.14 <none> <none>
coroot-clickhouse-shard0-0 1/1 Running 0 16d 10.2.2.54 192.168.1.14 <none> <none>
coroot-kube-state-metrics-597cfdc9f5-pjvxm 1/1 Running 0 16d 10.2.2.209 192.168.1.14 <none> <none>
coroot-node-agent-6wshb 1/1 Running 0 16d 10.2.2.219 192.168.1.14 <none> <none>
coroot-node-agent-cfsfx 1/1 Running 0 16d 10.2.1.124 192.168.1.20 <none> <none>
coroot-node-agent-rt8hk 1/1 Running 0 16d 10.2.0.110 192.168.1.24 <none> <none>
coroot-opentelemetry-collector-6659857566-nw4m4 1/1 Running 0 40h 10.2.2.160 192.168.1.14 <none> <none>
coroot-prometheus-server-669b7ccbb6-jfvzn 2/2 Running 0 16d 10.2.2.216 192.168.1.14 <none> <none>
coroot-pyroscope-6fb8fc4db-l5df5 1/1 Running 0 16d 10.2.2.102 192.168.1.14 <none> <none>
coroot-pyroscope-ebpf-6c6wx 1/1 Running 0 16d 10.2.0.54 192.168.1.24 <none> <none>
coroot-pyroscope-ebpf-dj6c6 1/1 Running 0 16d 10.2.2.61 192.168.1.14 <none> <none>
coroot-pyroscope-ebpf-tjkcq 1/1 Running 0 16d 10.2.1.59 192.168.1.20 <none> <none>
- deploy the client and server of perfomance test with command
taskset -c 0-1 wrk -t 2 -c 4 -d 60s http://ip --latency
.
➜ ebpf-performance kubectl -n cilium-test get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-b89648f96-2bz7r 1/1 Running 0 11m 10.2.2.57 192.168.1.14 <none> <none>
wrk-58fb8c49ff-d7p2c 1/1 Running 0 33m 10.2.1.161 192.168.1.20 <none> <none>
- we can start our performance test with client and server.
the result of without coroot environment:
[root@wrk-58fb8c49ff-s4g8b /]# taskset -c 0-1 wrk -t 2 -c 4 -d 60s http://172.16.2.191 --latency
Running 1m test @ http://172.16.2.191
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 286.99us 357.21us 12.03ms 96.71%
Req/Sec 8.22k 1.90k 16.70k 89.92%
Latency Distribution
50% 235.00us
75% 252.00us
90% 297.00us
99% 2.23ms
982111 requests in 1.00m, 796.08MB read
Requests/sec: 16366.99
Transfer/sec: 13.27MB
the result of with coroot environment:
[root@wrk-58fb8c49ff-d7p2c /]# taskset -c 0-1 wrk -t 2 -c 4 -d 60s http://10.2.2.57 --latency
Running 1m test @ http://10.2.2.57
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.78ms 7.83ms 182.95ms 92.91%
Req/Sec 3.22k 1.58k 8.56k 60.55%
Latency Distribution
50% 394.00us
75% 1.43ms
90% 7.29ms
99% 33.57ms
384280 requests in 1.00m, 311.49MB read
Requests/sec: 6396.37
Transfer/sec: 5.18MB
- the test environment:
os:Ubuntu / 20.04 LTS amd64 (64bit)
cri:containerd 1.6.20
Kubernetes version:1.24.4
kernel version:5.4.0-139-generic
- the yaml of client and server.
apiVersion: apps/v1
kind: Deployment
metadata:
name: wrk
spec:
selector:
matchLabels:
run: wrk
replicas: 1
template:
metadata:
labels:
run: wrk
spec:
initContainers:
- name: setsysctl
image: xxx/busybox:latest
securityContext:
privileged: true
command:
- sh
- -c
- |
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w fs.file-max=1048576
containers:
- name: wrk
image: xxx/wrk:4.2.0
ports:
- containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 1
minReadySeconds: 0
strategy:
type: RollingUpdate # 策略:滚动更新
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
sidecar.istio.io/inject: "false"
app: nginx
spec:
restartPolicy: Always
initContainers:
- name: setsysctl
image: xxx/busybox:latest
securityContext:
privileged: true
command:
- sh
- -c
- |
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w fs.file-max=1048576
containers:
- name: nginx
image: xxx/nginx:1.14.2
imagePullPolicy: Always
ports:
- containerPort: 80
command:
- /bin/sh
- -c
- "cd /usr/share/nginx/html/ && dd if=/dev/zero of=1k bs=1k count=1 && dd if=/dev/zero of=100k bs=1k count=100 && nginx -g \"daemon off;\""
eBPF limitations make it challenging to fully implement all L7 parsing on the kernel side. The choice of a 1024-byte payload size has been made to provide sufficient data for parsing L7 protocols in userland.
I've implemented several performance optimizations using your benchmark approach.
- without coroot-node-agent:
taskset -c 1-2 wrk -t 2 -c 4 -d 60s http://10.42.0.9:80/ --latency
Running 1m test @ http://10.42.0.9:80/
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 173.83us 114.66us 8.32ms 97.46%
Req/Sec 11.64k 1.10k 14.84k 72.46%
Latency Distribution
50% 158.00us
75% 185.00us
90% 229.00us
99% 365.00us
1391869 requests in 1.00m, 1.10GB read
Requests/sec: 23159.30
Transfer/sec: 18.77MB
- with coroot-node-agent v1.14.0 (with optimizations):
taskset -c 1-2 wrk -t 2 -c 4 -d 60s http://10.42.0.9:80/ --latency
Running 1m test @ http://10.42.0.9:80/
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 189.95us 129.27us 9.60ms 96.94%
Req/Sec 10.62k 1.11k 15.71k 72.67%
Latency Distribution
50% 171.00us
75% 207.00us
90% 259.00us
99% 414.00us
1268200 requests in 1.00m, 1.00GB read
Requests/sec: 21136.58
Transfer/sec: 17.13MB
This 9% degradation in request throughput can be attributed to the agent's CPU consumption, which reaches 30% of one CPU core. I expect that if there is no competition for CPU time, the degradation will be much less. At the eBPF level, the kernel ensures that the observer program does not introduce significant additional latency
@tanjunchen thank you for bringing up this topic
We've added the benchmark results to the documentation: https://coroot.com/docs/coroot-community-edition/getting-started/performance-impact