hcloud-talos/terraform-hcloud-talos

K8S Cluster Upgrade and Connectivity Problems with Talosctl

Opened this issue · 2 comments

Environment Details

  • Freshly installed K8S cluster using the following Terraform configuration:
    module "talos" {
      source  = "hcloud-talos/talos/hcloud"
      version = "v2.12.0"
    
      talos_version      = "v1.8.4"
      kubernetes_version = "1.31.1"
      ...
    }

Problem Description

  1. Terraform's Kubernetes Version Parameter

    • Updating the kubernetes_version in the Terraform configuration does not seem to have any effect.
    • I understand this is expected, and the Kubernetes version must be updated using the talosctl command.
  2. Version Check Behavior

    • Exported the Talos configuration and verified the version using:
      talosctl --talosconfig $TALOSCONFIG --nodes <control-plane-ip> version
    • Results:
      • Sometimes the output is as expected:
        Client:
                Tag:         v1.8.4
                SHA:         undefined
                Built:       2024-12-12T18:49:17Z
                Go version:  go1.23.4
                OS/Arch:     darwin/arm64
        Server:
                NODE:        <control-plane-ip>
                Tag:         v1.8.4
                SHA:         3c151c8a
                Built:       
                Go version:  go1.22.10
                OS/Arch:     linux/arm64
                Enabled:     RBAC
        
      • Frequently, however, I encounter the following error:
        Client:
                Tag:         v1.8.4
                SHA:         undefined
                Built:       2024-12-12T18:49:17Z
                Go version:  go1.23.4
                OS/Arch:     darwin/arm64
        Server:
        error getting version: 1 error occurred:
                * <control-plane-ip>: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <control-plane-ip>:50000: i/o timeout"
        
  3. Upgrade Kubernetes Version

    • Attempted to upgrade Kubernetes version using the following command:

      talosctl --talosconfig $TALOSCONFIG --nodes <control-plane-ip> upgrade-k8s --to 1.31.3
    • The upgrade process fails, and I observe two different errors intermittently:

      Error 1:

      error detecting the lowest Kubernetes version Get "https://kube.cluster.local:6443/api/v1/namespaces/kube-system/pods": dial tcp: lookup kube.cluster.local: no such host
      

      Error 2:

      error detecting the lowest Kubernetes version error building kubernetes client: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <control-plane-ip>:50000: i/o timeout"
      

Expected Behavior

  • Changing the kubernetes_version in Terraform should either:
    • Reflect in the deployed cluster, or
    • Be clearly documented as unsupported, requiring manual intervention.
  • The talosctl commands should reliably interact with the cluster for both version checks and upgrades.

Questions

  1. How can I ensure reliable connectivity to the cluster using talosctl?
  2. What is the proper way to upgrade the Kubernetes version in this setup?
  3. Are there any known issues or prerequisites I might be missing that cause the rpc error or DNS resolution failures?

When did you set up the cluster? There was a problem that the floating IP was used instead of a fixed IP in the Talosconfig.

Fixed here: 41bcbfa

Please check the IP in talosconfig.
This could explain the connection error. But it does not have to.

This is quite simple and stable: https://www.talos.dev/v1.9/kubernetes-guides/upgrading-kubernetes/

Not that I'm aware of.

I encountered the following error when working with Kubernetes:

error detecting the lowest Kubernetes version 
error building kubernetes client: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <control-plane-ip>:50000: i/o timeout"

Observations:

  1. This error seems to be related to a firewall configuration. When I extend the Hetzner firewall rules to allow traffic from any IPv4 and any IPv6 address, the timeout issue disappears.
  2. My operating system is macOS.

Persistent Issue

Even after resolving the timeout, another error persists:

error detecting the lowest Kubernetes version 
Get "https://kube.cluster.local:6443/api/v1/namespaces/kube-system/pods": dial tcp: lookup kube.cluster.local: no such host

I followed the official Talos documentation and executed the command:

talosctl --nodes <control-plane-ip> upgrade-k8s --to 1.31.3

However, I encountered the following error:

error detecting the lowest Kubernetes version 
Get "https://kube.cluster.local:6443/api/v1/namespaces/kube-system/pods": dial tcp: lookup kube.cluster.local: no such host

Problem

Due to the above error, the Kubernetes update is not possible.