Error invoking agent: unable to upgrade connection: container not found ("xk6-agent")
Closed this issue · 6 comments
When injecting faults in the quick pizza application
ERRO[0000] GoError: error injecting fault: error invoking agent: unable to upgrade connection: container not found ("xk6-agent")
The fault injection is done with this code
export function disrupt() {
const disruptor = new ServiceDisruptor("pizza-info", "pizza-ns");
const targets = disruptor.targets();
if (targets.length == 0) {
throw new Error("expected list to have one target");
}
disruptor.injectHTTPFaults({ averageDelay: "1000ms" }, "40s");
}
A quick inspection of the application reveals the target pod had restarted
kubectl get all -n pizza-ns
NAME READY STATUS RESTARTS AGE
pod/pizza-info-9b788866d-9m28p 1/1 Running 1 (26m ago) 6h9m
pod/pizza-info-9b788866d-xz9pw 1/1 Running 1 (26m ago) 6h9m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/pizza-info LoadBalancer 10.98.215.80 127.0.0.1 3333:32247/TCP 6h9m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/pizza-info 2/2 2 2 6h9m
NAME DESIRED CURRENT READY AGE
replicaset.apps/pizza-info-9b788866d 2 2 2 6h9m
As a side note, I can reproduce this message if I manually stop the xk6-disruptor
ephemeral container. I this case I stopped it manually by running
kubectl -n pizza-ns attach -ti pizza-info-9b788866d-6vv5c -c xk6-agent
And then exit
in the shell.
Describe for the pod in which I manually exited the container:
ERRO[0040] GoError: error injecting fault: error invoking agent: unable to upgrade connection: container not found ("xk6-agent")
load at reflect.methodValueCall (native)
at disrupt (file:///home/roobre/Devel/quickpizza/k6/12.basic-with-delay.js:35:29(26)) executor=shared-iterations scenario=disrupt source=stacktrace
11:54:57 ~ $> kc describe -n pizza-ns pod pizza-info-9b788866d-6vv5c
Name: pizza-info-9b788866d-6vv5c
Namespace: pizza-ns
Priority: 0
Service Account: pizza-info
Node: minikube/192.168.49.2
Start Time: Thu, 22 Jun 2023 11:48:16 +0200
Labels: app=pizza-info
pod-template-hash=9b788866d
Annotations: <none>
Status: Running
IP: 10.244.0.14
IPs:
IP: 10.244.0.14
Controlled By: ReplicaSet/pizza-info-9b788866d
Containers:
pizza-info:
Container ID: containerd://3da6065074e9b589a5eca6af23fc6625c8f5ddf93b87e34cf63054053cfd7245
Image: ghcr.io/grafana/quickpizza-local:latest
Image ID: ghcr.io/grafana/quickpizza-local@sha256:eb4d7eb611b52f25f3647888bce564185d16ce8a1cf2fdf8ec17ec1919cd3362
Port: 3333/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 22 Jun 2023 11:48:17 +0200
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
POD_NAME: pizza-info-9b788866d-6vv5c (v1:metadata.name)
POD_NAMESPACE: pizza-ns (v1:metadata.namespace)
POD_IP: (v1:status.podIP)
POD_SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xv8gf (ro)
Ephemeral Containers:
xk6-agent:
Container ID: containerd://a5e96ac0221eceb4d6b869d0e3933a6ca62ca38703d436fc4c86ca624456cfe6
Image: ghcr.io/grafana/xk6-disruptor-agent:v0.3.1
Image ID: ghcr.io/grafana/xk6-disruptor-agent@sha256:67094b9d6fe81a57c6ab99d672acf9a31ff56033389dcb02b3679cd4f6815fc5
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 22 Jun 2023 11:49:10 +0200
Finished: Thu, 22 Jun 2023 11:50:36 +0200
Ready: False
Restart Count: 0
Environment: <none>
Mounts: <none>
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-xv8gf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m50s default-scheduler Successfully assigned pizza-ns/pizza-info-9b788866d-6vv5c to minikube
Normal Pulling 6m50s kubelet Pulling image "ghcr.io/grafana/quickpizza-local:latest"
Normal Pulled 6m49s kubelet Successfully pulled image "ghcr.io/grafana/quickpizza-local:latest" in 611.748637ms (611.810066ms including waiting)
Normal Created 6m49s kubelet Created container pizza-info
Normal Started 6m49s kubelet Started container pizza-info
Normal Pulled 5m56s kubelet Container image "ghcr.io/grafana/xk6-disruptor-agent:v0.3.1" already present on machine
Normal Created 5m56s kubelet Created container xk6-agent
Normal Started 5m56s kubelet Started container xk6-agent
I can reproduce this message if I manually stop the xk6-disruptor ephemeral container. I this case I stopped it manually by running
kubectl -n pizza-ns attach -ti pizza-info-9b788866d-6vv5c -c xk6-agent
And then exit in the shell.
This is weird because I've done it many times when debugging or testing the agent manually. This is even documented.
I wonder if this is a regression.
@pablochacin I think that by doing attach
instead of exec
, I can kill the container on exit. I believe this happens because when I run attach
I hook into the shell that is running as the container entrypoint, instead of creating a new one like exec
does.
I think that by doing attach instead of exec
I haven't noticed the attach
I have found this same behavior in an unrelated test using the Inject HTTP faults example. The test used a kind
cluster deployed following the instructions found at the xk6-disruptor-demo.
In this case, the ephemeral container seems to had terminated "spontaneously"
ephemeralContainerStatuses:
- containerID: containerd://82214f374b6e0924de886d14fd83f3be83f746b0c63255e9ceb47bb568a7467e
image: ghcr.io/grafana/xk6-disruptor-agent:latest
imageID: ghcr.io/grafana/xk6-disruptor-agent@sha256:146dcfb9636e0be662a501171ef2bdcaa8e1b67ec1a05a99a0818495303637b3
lastState: {}
name: xk6-agent
ready: false
restartCount: 0
state:
terminated:
containerID: containerd://82214f374b6e0924de886d14fd83f3be83f746b0c63255e9ceb47bb568a7467e
exitCode: 255
finishedAt: "2023-06-21T16:21:17Z"
reason: Unknown
startedAt: "2023-06-21T12:13:24Z"
Something I noticed is that the kind's cluster apparently was restarted at some point. This can be seen from the difference between the created
and up
times reported by docker for the container on which the test cluster runs:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ef502cd89b15 kindest/node:v1.24.0 "/usr/local/bin/entr…" 23 hours ago Up 18 hours 0.0.0.0:38080->80/tcp, 127.0.0.1:35971->6443/tcp demo-control-plane
Coincidentally the up
time matches the uptime for my workstation
uptime
12:44:50 up 18:24, 1 user, load average: 2,57, 2,94, 2,79
I can reproduce this as well. Stopping the cluster (e.g. minikube stop
) and starting it again will cause the ephemeral container to exit, which in turn causes the error described in the issue.
I guess this, in fact, happens to all containers when the CRI stops. The difference might be that the kubelet does restart the main container, but not the ephemeral one.