v0.16.0 - Websocket error
Closed this issue · 3 comments
We are seeing the following errors looping in the logs:
19:46:39.420701 client.go:359: config-poller INFO: connection sucessfully opened to config discovery server at "ws://10.155.11.217:13478/api/v1/config/watch?id=test%2Fstunner-udp-gateway"
19:46:39.421414 reconcile.go:113: stunner INFO: setting loglevel to "all:INFO"
19:46:39.421605 reconcile.go:177: stunner INFO: reconciliation ready: new objects: 0, changed objects: 1, deleted objects: 0, started objects: 0, restarted objects: 0
19:46:39.421653 reconcile.go:181: stunner INFO: status: READY, realm: stunner.l7mp.io, authentication: longterm, listeners: test/stunner-udp-gateway/udp-listener: [turn-udp://10.x.x.x:3478<32768:65535>], active allocations: 0
19:46:40.422774 client.go:334: config-poller ERROR: config file discovery service: websocket: close 1006 (abnormal closure): unexpected EOF
We are currently running v0.16.0 as our current GKE cluster doesn't support the v1 Gateway API. It looks like the CDS process has been rewritten in later versions, but just wondering if there are any workarounds in the meantime.
Not that I know of. Managed mode was fairly new in v0.16 and we have made an almost complete rewrite during the last two releases exactly to eliminate the instability you are experiencing.
I see two alternatives for now:
- Revert to the unmanaged (legacy) dataplane mode: this used to be the default in v0.16 anyway and it was rock solid at that point. This will most probably require a full reinstall though: https://docs.l7mp.io/en/v0.16.0/INSTALL/#basic-installation. Plus, compared to the now-default managed mode it's a massive step back, but it's at least super-reliable: I know of a lot of users who still run v0.16 exactly due to this.
- Move from GKE Autopilot to GKE standard mode clusters and upgrade to STUNner v0.18. Standard-mode clusters do not auto-enable Google's own version of the Gateway API so you can safely install v0.18 there (make sure to untick the Gateway API checkbox on provisioning the cluster). Depending on how much you rely on the pricing model in Autopilot and how many services you already run in your cluster, this may be the better option for now to get the freshest of STUNner.
We're terribly sorry for this situation, we understand how unpleasant this state-of-affairs is to our users. We are at Google's mercy at this point: let's hope they quickly upgrade to v1. Good news is that we're not alone.
Thanks for the advice. I'm not sure if #136 is related to this as well. I will move to standalone mode and confirm if that resolves both issues
This looks to have resolved the issue so I'm going to close it off.