weibeld/k1s

k1s script stopped updating pods output

jgilfoil opened this issue · 1 comments

So yesterday I installed the k1s script on one of my k3s nodes. Really great simple concept, and exactly what I wanted for my extra monitor attached to my cluster. I created a tmux session with 12 different windows, each with an instance of the k1s script running, displaying different resources.

It appeared to be working, but I hadn't yet tested a change to watch it update in real time. I left it over night and came back to it 12 hours later. I deployed an updated version of kured to my cluster, which updated the daemonset and pods. However, the tmux window that was displaying the kube-system pods did not reflect the change, and after waiting a few minutes, and verifying on another window that the kured pods had been replaced, i control+c out of k1s and reran the k1s kube-system pods command and it updated with the new pods names that had been replaced for the kured ds. I also tested making a further change by deleting one of the Daemon Set pods for kured and it shows the replacement in real time in the k1s output.

So, it's only been one instance of this issue, and i'm wondering if there's some time out or limit with the kubernetes api i'm not aware of. If not, any advice on what information to collect or how to troubleshoot this? I'm thinking about just modifying the bash script to add +x and some logging to a file, but thought i'd ask if you had any better ideas.

If I find the problem or solution, i'll be happy to submit a pull request.

It's most probably a connection timeout. The script uses kubectl proxy to create a connection to the Kubernetes API server, and then through this connection makes a watch request to the Kubernetes API.

So there are two places where the timeout might occur, the proxy connection of the watch API request. The timeout is most probably initiated by the Kubernetes API server, i.e. depends on your specific Kubernetes configuration.

This would require further investigation. However, there might be ways to bypass the problem, for example, renewing the connection on a regular basis (e.g. every 5 minutes) by default.

I leave this open for future work.