coryodaniel/k8s

Does the watch steam handle disconnects in V2?

Closed this issue · 13 comments

First of all, thank you for a super useful and well-implemented library.
I have a question regarding Watch.Stream.
In V1, the function watch_and_stream would automatically handle timeouts from the Kubernetes api, looking at current implementations it is a bit unclear if we should handle this ourselves or if the library somehow does this.

Hi @Hanspagh
It definitely should. If it doesn't, I'd consider it a bug. Do you have indication that it doesn't or you just wanna make sure?

I do not have something concrete, just the indications that after a while (half a day), I do not seem to be getting any new events. I am using Kubernetes in azure, we have noticed in python we need to handle timeout explicitly since they would shut down our watchers. So was just wondering if the same could be happening here

Half a day? Very strange... But I'm gonna have a look...

I have enabled debug logging to see, if I can get a bit more information, I know the kubenetes api sets a semi-random timeout on all the watch requests to spread out the load
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/ ---min-request-timeout.

In V1 I could see you handle the timeout explicitly here and I couldn't seem to find the corresponding code in V2, that is what got me wondering

Hey @Hanspagh, I've tried to reproduce this on a local cluster. See my findings:

Steps

  • I have set up a local cluster using kind.
  • Using docker exec I've updated /etc/kubernetes/manifests/kube-apiserver.yaml, adding - --min-request-timeout=5 to the container command.
  • restarted the docker container
  • Started a watch (with debug statements).

Behaviour

Every 6-7s, the watcher receives a BOOKMARK, followed by a :done. :done Is sent by the Mint adapter to signal that the request ended (in this case, probably because of a timeout). Upon receiving a :done, the watcher resumes the watch.

Conclusion

I have to assume the watcher works as expected.

Now what...

I'm still curious however about the reason why you stop receiving events. Does the debug logging shed any light?

I do not seem to be getting any new events.

Is the process running the stream still alive at that point?

I am using Kubernetes in azure

In what version?

Thank you for investigating this, I will report back once this happens again, it should not be more than a day.

I am using Flow on top of the stream to capture updates over a time window, so maybe something in there is a cause of the problems.

Could it be the api-server going away? Because of maintenance or so... but that would not occur on a daily basis I guess...

Enabled debug logging, now streams have been running fine for 48 hours. This might have been an azure Kubernetes thing after all. Sorry for the inconvenience. Will comment or reopen if I at some point find out what was the cause of this

I'm gonna have to re-open this. It was there directly in front of me and I did not see it. But when the server goes away (in my test I can simulate this by restarting the docker container running the cluster), k8s doesn't always recognise this. Since the connection is cached it tries to make requests using the same (closed) connection. Or worse, the stream just stays open (like in your case).

Ahh, interesting, I guess that could have happened for my cluster.

I'm pretty sure about it. I'm trying to fix this, but I'm struggling and currently have limited time. But it's an important requirement for Bonny.

OK I think I have working code. Starting a PR soon.

This should be fixed in 2.0.3. @Hanspagh could you please keep me posted? Thanks!