coroot/coroot-node-agent

【Advice】Providing performance testing documents and data

tanjunchen opened this issue · 3 comments

May I ask what is the additional network latency for L4 and L7 caused by coroot-node-agent?
What is the impact of eBPF application topology, trace, etc. on the application?
Can official documents provide performance pressure test data?

After our testing, the coroot-node-agent network latency p90 increased by 6460us and QPS decreased by 50%, as shown in the flame diagram below.
image

50faa2b703bca3c9eb9496bc91f2c13d
987a63f0f0dfaf69fd9d7245ac04eef2

Why is the value size of #define MAX_PAYLOAD_SIZE 1024? Is it quite time-consuming to process the logic of L7 here, and why is it 1024 bytes?

the process of performance test:

  1. deploy coroot according to the document of coroot website.
➜  ebpf-performance kubectl -n coroot get pod -owide
NAME                                              READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
coroot-68d887b548-4fhkn                           1/1     Running   0          16d   10.2.2.10    192.168.1.14   <none>           <none>
coroot-clickhouse-shard0-0                        1/1     Running   0          16d   10.2.2.54    192.168.1.14   <none>           <none>
coroot-kube-state-metrics-597cfdc9f5-pjvxm        1/1     Running   0          16d   10.2.2.209   192.168.1.14   <none>           <none>
coroot-node-agent-6wshb                           1/1     Running   0          16d   10.2.2.219   192.168.1.14   <none>           <none>
coroot-node-agent-cfsfx                           1/1     Running   0          16d   10.2.1.124   192.168.1.20   <none>           <none>
coroot-node-agent-rt8hk                           1/1     Running   0          16d   10.2.0.110   192.168.1.24   <none>           <none>
coroot-opentelemetry-collector-6659857566-nw4m4   1/1     Running   0          40h   10.2.2.160   192.168.1.14   <none>           <none>
coroot-prometheus-server-669b7ccbb6-jfvzn         2/2     Running   0          16d   10.2.2.216   192.168.1.14   <none>           <none>
coroot-pyroscope-6fb8fc4db-l5df5                  1/1     Running   0          16d   10.2.2.102   192.168.1.14   <none>           <none>
coroot-pyroscope-ebpf-6c6wx                       1/1     Running   0          16d   10.2.0.54    192.168.1.24   <none>           <none>
coroot-pyroscope-ebpf-dj6c6                       1/1     Running   0          16d   10.2.2.61    192.168.1.14   <none>           <none>
coroot-pyroscope-ebpf-tjkcq                       1/1     Running   0          16d   10.2.1.59    192.168.1.20   <none>           <none>
  1. deploy the client and server of perfomance test with command taskset -c 0-1 wrk -t 2 -c 4 -d 60s http://ip --latency.
➜  ebpf-performance kubectl -n cilium-test get pod -owide
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
nginx-b89648f96-2bz7r   1/1     Running   0          11m   10.2.2.57    192.168.1.14   <none>           <none>
wrk-58fb8c49ff-d7p2c    1/1     Running   0          33m   10.2.1.161   192.168.1.20   <none>           <none>
  1. we can start our performance test with client and server.
    the result of without coroot environment:
[root@wrk-58fb8c49ff-s4g8b /]# taskset -c 0-1 wrk -t 2 -c 4 -d 60s http://172.16.2.191 --latency
Running 1m test @ http://172.16.2.191
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   286.99us  357.21us  12.03ms   96.71%
    Req/Sec     8.22k     1.90k   16.70k    89.92%
  Latency Distribution
     50%  235.00us
     75%  252.00us
     90%  297.00us
     99%    2.23ms
  982111 requests in 1.00m, 796.08MB read
Requests/sec:  16366.99
Transfer/sec:     13.27MB

the result of with coroot environment:

[root@wrk-58fb8c49ff-d7p2c /]# taskset -c 0-1 wrk -t 2 -c 4 -d 60s http://10.2.2.57 --latency
Running 1m test @ http://10.2.2.57
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.78ms    7.83ms 182.95ms   92.91%
    Req/Sec     3.22k     1.58k    8.56k    60.55%
  Latency Distribution
     50%  394.00us
     75%    1.43ms
     90%    7.29ms
     99%   33.57ms
  384280 requests in 1.00m, 311.49MB read
Requests/sec:   6396.37
Transfer/sec:      5.18MB
  1. the test environment:
os:Ubuntu / 20.04 LTS amd64 (64bit)   
cri:containerd 1.6.20
Kubernetes version:1.24.4
kernel version:5.4.0-139-generic
  1. the yaml of client and server.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wrk
spec:
  selector:
    matchLabels:
      run: wrk
  replicas: 1
  template:
    metadata:
      labels:
        run: wrk
    spec:
      initContainers:
      - name: setsysctl
        image: xxx/busybox:latest
        securityContext:
          privileged: true
        command:
        - sh
        - -c
        - |
          sysctl -w net.core.somaxconn=65535
          sysctl -w net.ipv4.ip_local_port_range="1024 65535"
          sysctl -w net.ipv4.tcp_tw_reuse=1
          sysctl -w fs.file-max=1048576
      containers:
      - name: wrk
        image: xxx/wrk:4.2.0
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  minReadySeconds: 0
  strategy:
    type: RollingUpdate # 策略:滚动更新
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        sidecar.istio.io/inject: "false"
        app: nginx
    spec:
      restartPolicy: Always
      initContainers:
        - name: setsysctl
          image: xxx/busybox:latest
          securityContext:
            privileged: true
          command:
            - sh
            - -c
            - |
              sysctl -w net.core.somaxconn=65535
              sysctl -w net.ipv4.ip_local_port_range="1024 65535"
              sysctl -w net.ipv4.tcp_tw_reuse=1
              sysctl -w fs.file-max=1048576
      containers:
        - name: nginx
          image: xxx/nginx:1.14.2
          imagePullPolicy: Always
          ports:
            - containerPort: 80
          command:
            - /bin/sh
            - -c
            - "cd /usr/share/nginx/html/ && dd if=/dev/zero of=1k bs=1k count=1 && dd if=/dev/zero of=100k bs=1k count=100 && nginx -g \"daemon off;\""
def commented

eBPF limitations make it challenging to fully implement all L7 parsing on the kernel side. The choice of a 1024-byte payload size has been made to provide sufficient data for parsing L7 protocols in userland.

def commented

I've implemented several performance optimizations using your benchmark approach.

  • without coroot-node-agent:
taskset -c 1-2 wrk -t 2 -c 4 -d 60s http://10.42.0.9:80/ --latency
Running 1m test @ http://10.42.0.9:80/
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   173.83us  114.66us   8.32ms   97.46%
    Req/Sec    11.64k     1.10k   14.84k    72.46%
  Latency Distribution
     50%  158.00us
     75%  185.00us
     90%  229.00us
     99%  365.00us
  1391869 requests in 1.00m, 1.10GB read
Requests/sec:  23159.30
Transfer/sec:     18.77MB
  • with coroot-node-agent v1.14.0 (with optimizations):
taskset -c 1-2 wrk -t 2 -c 4 -d 60s http://10.42.0.9:80/ --latency
Running 1m test @ http://10.42.0.9:80/
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   189.95us  129.27us   9.60ms   96.94%
    Req/Sec    10.62k     1.11k   15.71k    72.67%
  Latency Distribution
     50%  171.00us
     75%  207.00us
     90%  259.00us
     99%  414.00us
  1268200 requests in 1.00m, 1.00GB read
Requests/sec:  21136.58
Transfer/sec:     17.13MB

This 9% degradation in request throughput can be attributed to the agent's CPU consumption, which reaches 30% of one CPU core. I expect that if there is no competition for CPU time, the degradation will be much less. At the eBPF level, the kernel ensures that the observer program does not introduce significant additional latency

@tanjunchen thank you for bringing up this topic

def commented

We've added the benchmark results to the documentation: https://coroot.com/docs/coroot-community-edition/getting-started/performance-impact