k0sproject/k0sctl

k0sctl init doesn't come to a working cluster

RobertMirantis opened this issue · 5 comments

k0sctl init > cluster.yaml

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 10.0.0.1
      user: root
      port: 22
      keyPath: null
    role: controller
  - ssh:
      address: 10.0.0.2
      user: root
      port: 22
      keyPath: null
    role: worker
  k0s:
    version: 1.28.5+k0s.0
    dynamicConfig: false

Is missing all the k0s configuration pieces.
The cluster gets created successfully (on AWS ubuntu images).

But when deploying a simple nginx configuration the nginx pod succesfully starts but you can't connect to it or see logs.
It looks promising but actually doesn't operate (at all!)

STEP 1: Create a template file

[ec2-user@ip-172-31-21-103 scripts]$ k0sctl init > cluster2.yaml
[ec2-user@ip-172-31-21-103 scripts]$ cat cluster2.yaml 
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 10.0.0.1
      user: root
      port: 22
      keyPath: null
    role: controller
  - ssh:
      address: 10.0.0.2
      user: root
      port: 22
      keyPath: null
    role: worker
  k0s:
    version: 1.28.5+k0s.0
    dynamicConfig: false

STEP 2: Fill in the addresses, keyPath and user.

STEP 3: Run k0sctl

[ec2-user@ip-172-31-21-103 scripts]$ k0sctl apply --config cluster2.yaml

⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███
⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███          ███    ███
⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿⠀⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███          ███    ███
⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███          ███    ███
⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████    ███    ██████████
k0sctl v0.15.4 Copyright 2023, k0sctl authors.
Anonymized telemetry of usage will be sent to the authors.
By continuing to use k0sctl you agree to these terms:
https://k0sproject.io/licenses/eula
WARN An old cache directory still exists at /home/ec2-user/.k0sctl/cache, k0sctl now uses /home/ec2-user/.cache/k0sctl 
INFO ==> Running phase: Connect to hosts 
INFO [ssh] 172.31.26.180:22: connected            
INFO [ssh] 172.31.23.3:22: connected              
INFO [ssh] 172.31.21.101:22: connected            
INFO [ssh] 172.31.20.207:22: connected            
INFO [ssh] 172.31.23.87:22: connected             
INFO [ssh] 172.31.22.124:22: connected            
INFO ==> Running phase: Detect host operating systems 
INFO [ssh] 172.31.20.207:22: is running Ubuntu 18.04.5 LTS 
INFO [ssh] 172.31.23.87:22: is running Ubuntu 18.04.5 LTS 
INFO [ssh] 172.31.23.3:22: is running Ubuntu 18.04.5 LTS 
INFO [ssh] 172.31.22.124:22: is running Ubuntu 18.04.5 LTS 
INFO [ssh] 172.31.26.180:22: is running Ubuntu 18.04.5 LTS 
INFO [ssh] 172.31.21.101:22: is running Ubuntu 18.04.5 LTS 
INFO ==> Running phase: Acquire exclusive host lock 
INFO ==> Running phase: Prepare hosts    
INFO ==> Running phase: Gather host facts 
INFO [ssh] 172.31.23.87:22: using ip-172-31-23-87 as hostname 
INFO [ssh] 172.31.26.180:22: using ip-172-31-26-180 as hostname 
INFO [ssh] 172.31.22.124:22: using ip-172-31-22-124 as hostname 
INFO [ssh] 172.31.23.3:22: using ip-172-31-23-3 as hostname 
INFO [ssh] 172.31.21.101:22: using ip-172-31-21-101 as hostname 
INFO [ssh] 172.31.20.207:22: using ip-172-31-20-207 as hostname 
INFO [ssh] 172.31.22.124:22: discovered ens5 as private interface 
INFO [ssh] 172.31.23.3:22: discovered ens5 as private interface 
INFO [ssh] 172.31.23.87:22: discovered ens5 as private interface 
INFO [ssh] 172.31.20.207:22: discovered ens5 as private interface 
INFO [ssh] 172.31.26.180:22: discovered ens5 as private interface 
INFO [ssh] 172.31.21.101:22: discovered ens5 as private interface 
INFO ==> Running phase: Validate hosts   
INFO ==> Running phase: Gather k0s facts 
INFO ==> Running phase: Validate facts   
INFO ==> Running phase: Configure k0s    
WARN [ssh] 172.31.20.207:22: generating default configuration 
INFO [ssh] 172.31.23.3:22: validating configuration 
INFO [ssh] 172.31.20.207:22: validating configuration 
INFO [ssh] 172.31.22.124:22: validating configuration 
INFO [ssh] 172.31.23.3:22: configuration was changed 
INFO [ssh] 172.31.20.207:22: configuration was changed 
INFO [ssh] 172.31.22.124:22: configuration was changed 
INFO ==> Running phase: Initialize the k0s cluster 
INFO [ssh] 172.31.20.207:22: installing k0s controller 
INFO [ssh] 172.31.20.207:22: waiting for the k0s service to start 
INFO [ssh] 172.31.20.207:22: waiting for kubernetes api to respond 
INFO ==> Running phase: Install controllers 
INFO [ssh] 172.31.20.207:22: generating token     
INFO [ssh] 172.31.23.3:22: writing join token     
INFO [ssh] 172.31.23.3:22: installing k0s controller 
INFO [ssh] 172.31.23.3:22: starting service       
INFO [ssh] 172.31.23.3:22: waiting for the k0s service to start 
INFO [ssh] 172.31.23.3:22: waiting for kubernetes api to respond 
INFO [ssh] 172.31.20.207:22: generating token     
INFO [ssh] 172.31.22.124:22: writing join token   
INFO [ssh] 172.31.22.124:22: installing k0s controller 
INFO [ssh] 172.31.22.124:22: starting service     
INFO [ssh] 172.31.22.124:22: waiting for the k0s service to start 
INFO [ssh] 172.31.22.124:22: waiting for kubernetes api to respond 
INFO ==> Running phase: Install workers  
INFO [ssh] 172.31.23.87:22: validating api connection to https://172.31.20.207:6443 
INFO [ssh] 172.31.21.101:22: validating api connection to https://172.31.20.207:6443 
INFO [ssh] 172.31.26.180:22: validating api connection to https://172.31.20.207:6443 
INFO [ssh] 172.31.20.207:22: generating token     
INFO [ssh] 172.31.23.87:22: writing join token    
INFO [ssh] 172.31.26.180:22: writing join token   
INFO [ssh] 172.31.21.101:22: writing join token   
INFO [ssh] 172.31.26.180:22: installing k0s worker 
INFO [ssh] 172.31.21.101:22: installing k0s worker 
INFO [ssh] 172.31.23.87:22: installing k0s worker 
INFO [ssh] 172.31.26.180:22: starting service     
INFO [ssh] 172.31.26.180:22: waiting for node to become ready 
INFO [ssh] 172.31.21.101:22: starting service     
INFO [ssh] 172.31.23.87:22: starting service      
INFO [ssh] 172.31.21.101:22: waiting for node to become ready 
INFO [ssh] 172.31.23.87:22: waiting for node to become ready 
INFO ==> Running phase: Release exclusive host lock 
INFO ==> Running phase: Disconnect from hosts 
INFO ==> Finished in 1m4s                
INFO k0s cluster version v1.28.4+k0s.0 is now installed 
INFO Tip: To access the cluster you can now fetch the admin kubeconfig using: 
INFO      k0sctl kubeconfig

--> SEE HOW EVERYTHING LOOKS BEAUTIFUL

Step 4: Get a kubeconfig

k0sctl kubeconfig --config cluster2.yaml

Step 5: Look into the cluster:

[ec2-user@ip-172-31-21-103 scripts]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                              READY   STATUS             RESTARTS        AGE
kube-system   coredns-85df575cdb-b2mp4          0/1     Running            5 (65s ago)     8m49s
kube-system   coredns-85df575cdb-l6z8f          0/1     Running            5 (28s ago)     8m43s
kube-system   konnectivity-agent-64scx          1/1     Running            6 (2m4s ago)    8m42s
kube-system   konnectivity-agent-n6gvz          1/1     Running            6 (2m2s ago)    8m48s
kube-system   konnectivity-agent-q7hfn          1/1     Running            6               8m48s
kube-system   kube-proxy-hkzd2                  1/1     Running            0               8m48s
kube-system   kube-proxy-m8lxn                  1/1     Running            0               8m42s
kube-system   kube-proxy-mqg9w                  1/1     Running            0               8m48s
kube-system   kube-router-dxjdv                 1/1     Running            0               8m48s
kube-system   kube-router-nplz2                 1/1     Running            0               8m48s
kube-system   kube-router-wvhzr                 1/1     Running            0               8m41s
kube-system   metrics-server-7556957bb7-lpqr7   0/1     CrashLoopBackOff   6 (2m26s ago)   8m49s

Not working...

Step 6: (if needed because above has no good news)

[ec2-user@ip-172-31-21-103 scripts]$ kubectl create deployment mydep --image=nginx --replicas=3 
deployment.apps/mydep created
[ec2-user@ip-172-31-21-103 scripts]$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
mydep-66c55fb688-cbkgl   1/1     Running   0          10s
mydep-66c55fb688-nbhtg   1/1     Running   0          10s
mydep-66c55fb688-pbjpl   1/1     Running   0          10s

Again, looks promising.

[ec2-user@ip-172-31-21-103 scripts]$ kubectl logs mydep-66c55fb688-cbkgl
Error from server: Get "https://172.31.21.101:10250/containerLogs/default/mydep-66c55fb688-cbkgl/nginx": No agent available
[ec2-user@ip-172-31-21-103 scripts]$ kubectl exec mydep-66c55fb688-cbkgl -it -- bash
Error from server: error dialing backend: No agent available

But they are running, but not working!

FINAL REMARKS:

kke commented

For reference, here is what the k0s section looks like with --k0s:

  k0s:
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          kubeProxy:
            disabled: false
            mode: iptables
          kuberouter:
            autoMTU: true
            mtu: 0
            peerRouterASNs: ""
            peerRouterIPs: ""
          podCIDR: 10.244.0.0/16
          provider: kuberouter
          serviceCIDR: 10.96.0.0/12
        podSecurityPolicy:
          defaultPolicy: 00-k0s-privileged
        storage:
          type: etcd
        telemetry:
          enabled: true

When k0s.config is not set, k0sctl will use the output of k0s config create to create a default config.

This would then mean that the default config of k0s does not work, which I don't think is likely.

There seems to be some networking problem. The konnectivity agents have some troubles to connect to the k0s controller. I guess CoreDNS and metrics-server are failing for similar reasons. Can you check the pod logs directly on the worker node? They should be in /var/log/containers/konnectivity-agent-*.log, /var/log/containers/coredns-*.log and so on.

Something has changed. Same terraform scripts (running a k0sctl init without --k0s ) is now working fine.