K8S Cluster Upgrade and Connectivity Problems with Talosctl
Opened this issue · 2 comments
Environment Details
- Freshly installed K8S cluster using the following Terraform configuration:
module "talos" { source = "hcloud-talos/talos/hcloud" version = "v2.12.0" talos_version = "v1.8.4" kubernetes_version = "1.31.1" ... }
Problem Description
-
Terraform's Kubernetes Version Parameter
- Updating the
kubernetes_version
in the Terraform configuration does not seem to have any effect. - I understand this is expected, and the Kubernetes version must be updated using the
talosctl
command.
- Updating the
-
Version Check Behavior
- Exported the Talos configuration and verified the version using:
talosctl --talosconfig $TALOSCONFIG --nodes <control-plane-ip> version
- Results:
- Sometimes the output is as expected:
Client: Tag: v1.8.4 SHA: undefined Built: 2024-12-12T18:49:17Z Go version: go1.23.4 OS/Arch: darwin/arm64 Server: NODE: <control-plane-ip> Tag: v1.8.4 SHA: 3c151c8a Built: Go version: go1.22.10 OS/Arch: linux/arm64 Enabled: RBAC
- Frequently, however, I encounter the following error:
Client: Tag: v1.8.4 SHA: undefined Built: 2024-12-12T18:49:17Z Go version: go1.23.4 OS/Arch: darwin/arm64 Server: error getting version: 1 error occurred: * <control-plane-ip>: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <control-plane-ip>:50000: i/o timeout"
- Sometimes the output is as expected:
- Exported the Talos configuration and verified the version using:
-
Upgrade Kubernetes Version
-
Attempted to upgrade Kubernetes version using the following command:
talosctl --talosconfig $TALOSCONFIG --nodes <control-plane-ip> upgrade-k8s --to 1.31.3
-
The upgrade process fails, and I observe two different errors intermittently:
Error 1:
error detecting the lowest Kubernetes version Get "https://kube.cluster.local:6443/api/v1/namespaces/kube-system/pods": dial tcp: lookup kube.cluster.local: no such host
Error 2:
error detecting the lowest Kubernetes version error building kubernetes client: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <control-plane-ip>:50000: i/o timeout"
-
Expected Behavior
- Changing the
kubernetes_version
in Terraform should either:- Reflect in the deployed cluster, or
- Be clearly documented as unsupported, requiring manual intervention.
- The
talosctl
commands should reliably interact with the cluster for both version checks and upgrades.
Questions
- How can I ensure reliable connectivity to the cluster using
talosctl
? - What is the proper way to upgrade the Kubernetes version in this setup?
- Are there any known issues or prerequisites I might be missing that cause the
rpc error
or DNS resolution failures?
When did you set up the cluster? There was a problem that the floating IP was used instead of a fixed IP in the Talosconfig.
Fixed here: 41bcbfa
Please check the IP in talosconfig.
This could explain the connection error. But it does not have to.
This is quite simple and stable: https://www.talos.dev/v1.9/kubernetes-guides/upgrading-kubernetes/
Not that I'm aware of.
I encountered the following error when working with Kubernetes:
error detecting the lowest Kubernetes version
error building kubernetes client: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <control-plane-ip>:50000: i/o timeout"
Observations:
- This error seems to be related to a firewall configuration. When I extend the Hetzner firewall rules to allow traffic from any IPv4 and any IPv6 address, the timeout issue disappears.
- My operating system is macOS.
Persistent Issue
Even after resolving the timeout, another error persists:
error detecting the lowest Kubernetes version
Get "https://kube.cluster.local:6443/api/v1/namespaces/kube-system/pods": dial tcp: lookup kube.cluster.local: no such host
I followed the official Talos documentation and executed the command:
talosctl --nodes <control-plane-ip> upgrade-k8s --to 1.31.3
However, I encountered the following error:
error detecting the lowest Kubernetes version
Get "https://kube.cluster.local:6443/api/v1/namespaces/kube-system/pods": dial tcp: lookup kube.cluster.local: no such host
Problem
Due to the above error, the Kubernetes update is not possible.