k0sproject/k0smotron

k0smotron and Cluster API Provider Openstack (CAPO) don't work together

michaelbayr opened this issue · 6 comments

Hey everyone,

thanks for this awesome project! I was curious to test k0smotron as a control plane for openstack workers. I checked out the AWS/Hetzner examples from the mirantis resources and tried to come up with a cluster.yaml to post.
I am running CAPI 1.6.0, CAPO 0.8.0 and k0smotron 0.7.2.
The placeholders in brackets e.g. {b64CERT} I just did to redact credentials.
When I apply the following yaml, the k0smotron controlplane spawns, allocates a loadbalancer and is accessible via kubeconfig. However the openstack worker machine is never spawned. The credentials are correct since CAPO logs the check, but CAPO never reaches the point where it actually boots up any machines. It justs sits there without any errors and waits.

The output of clusterctl:

NAME                                                                 READY  SEVERITY  REASON                           SINCE  MESSAGE                                                       
Cluster/testcluster1                                               False  Info      WaitingForInfrastructure         42m                                                                   
├─ClusterInfrastructure - OpenStackCluster/testcluster1                                                                                                                                    
├─ControlPlane - K0smotronControlPlane/testcluster1                                                                                                                                        
└─Workers                                                                                                                                                                                    
  └─MachineDeployment/testcluster1-md-0                            False  Warning   WaitingForAvailableMachines      2h  Minimum availability requires 1 replicas, current 0 available  
    └─Machine/testcluster1-md-0-vmnk4-k29bw                        False  Info      WaitingForClusterInfrastructure  2h  1 of 2 completed                                               
      └─BootstrapConfig - K0sWorkerConfig/testcluster1-md-0-5qmzp         

The posted YAML:

apiVersion: v1
kind: Namespace
metadata:
  name: testcluster # Namespace in which we want to deploy the child cluster
spec: {}
status: {}
---
apiVersion: v1
data:
  cacert: {b64CERT}
  clouds.yaml: {b64CLOUDS}
kind: Secret
metadata:
  labels:
    clusterctl.cluster.x-k8s.io/move: "true"
  name: testcluster1-cloud-config
  namespace: testcluster
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: testcluster1 # Cluster Name
  namespace: testcluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks: [10.244.0.0/16]
    services:
      cidrBlocks: [10.96.0.0/12]
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: K0smotronControlPlane
    name: testcluster1 # Reference to the k0s control plane we are creating in the management cluster
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
    kind: OpenStackCluster
    name: testcluster1 # Reference to the HetznerCluster CRD that trigger the Hetzner Controller Manager
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: K0smotronControlPlane
metadata:
  name: testcluster1 # Cluster Name
  namespace: testcluster
spec:
  k0sVersion: v1.27.8-k0s.0 # https://github.com/k0sproject/k0s/releases
  persistence:
    type: emptyDir
  service:
    type: LoadBalancer
    apiPort: 6443
    konnectivityPort: 8132
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackCluster
metadata:
  name: testcluster1
  namespace: testcluster
  annotations:
    cluster.x-k8s.io/managed-by: k0smotron
spec:
  apiServerLoadBalancer:
    enabled: false
  cloudName: {CLOUDNAME}
  dnsNameservers:
    - {NAMESERVERS}
  externalNetworkId: {EXTERNALNET}
  identityRef:
    kind: Secret
    name: testcluster1-cloud-config
  managedSecurityGroups: true
  nodeCidr: 10.8.0.0/20
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: testcluster1-md-0
  namespace: testcluster
spec:
  clusterName: testcluster1
  replicas: 1
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: K0sWorkerConfigTemplate
          name: testcluster1-md-0
      clusterName: testcluster1
      failureDomain: {AZ}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
        kind: OpenStackMachineTemplate
        name: testcluster1-md-0
      version: v1.27.8
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackMachineTemplate
metadata:
  name: testcluster1-md-0
  namespace: testcluster
spec:
  template:
    spec:
      cloudName: {CLOUDNAME}
      flavor: {FLAVOR}
      identityRef:
        kind: Secret
        name: testcluster1-cloud-config
      image: {IMAGE}
      sshKeyName: {KEYNAME}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: K0sWorkerConfigTemplate
metadata:
  name: testcluster1-md-0
  namespace: testcluster
spec:
  template:
    spec:
      version: v1.27.8+k0s.0

Any pointers/help would be greatly appreciated!

However the openstack worker machine is never spawned. The credentials are correct since CAPO logs the check, but CAPO never reaches the point where it actually boots up any machines. It justs sits there without any errors and waits.

This sounds like it's waiting on some object status to get ready.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackCluster
metadata:
  name: testcluster1
  namespace: testcluster
  annotations:
    cluster.x-k8s.io/managed-by: k0smotron # <-- this is bit of a suspect for me

I haven't used CAPO provider, thus I'm not sure if you HAVE to use that annotation. In case of AWS we use that to prevent the CAPA from provisioning stuff it doesn't need to as it does not have all the options to disable provisioning e.g. ctrl LB etc.

What this annotation also means (AFAIK) is that CAPO will NOT provision anything on the infra. It also has the side effect of the OpenStackCluster/testcluster1 object never getting to ready status. In AWS case it's the same, thus one needs to manually patch it ready:

kubectl patch OpenStackCluster testcluster1 -n  testcluster --type=merge --subresource status --patch 'status: {ready: true}'

Thank you so much for your input. You were right, and the cluster.x-k8s.io/managed-by: k0smotron did stop CAPO from provisioning any machines. After removing the line, CAPO successfully schedules the machines and they provision into the cluster. A big step forward!
Sadly there still is a small issue. The clusterctl output stays in WaitingForAvailableMachines, allthough everything seems to be provisioned fine. Usually this message indicates, that no CNI is installed in the cluster. As far as I am aware k0s and therewith k0smotron should ship with kuberouter + calico out of the box? The kube-router pod is started and running on the worker node. Any idea why this happens?

clusterctl describe cluster testcluster1 -n testcluster     
NAME                                                                 READY  SEVERITY  REASON                       SINCE  MESSAGE                                                       
Cluster/testcluster1                                               True                                          16m                                                                   
├─ClusterInfrastructure - OpenStackCluster/testcluster1                                                                                                                                
├─ControlPlane - K0smotronControlPlane/testcluster1                                                                                                                                    
└─Workers                                                                                                                                                                                
  └─MachineDeployment/testcluster1-md-0                            False  Warning   WaitingForAvailableMachines  18m    Minimum availability requires 1 replicas, current 0 available  
    └─Machine/testcluster1-md-0-j9jh2-f2292                        True                                          15m                                                                   
      └─BootstrapConfig - K0sWorkerConfig/testcluster1-md-0-xkj9w 

For anybody finding this issue later on: here is the newly posted cluster-template.yaml. Note that I had to expand the fields for the openstack-cluster giving an empty apiServerFixedIP because the deployment would fail without it (k0smotron tries to patch the LoadBalancer IP into the cluster but the spec is immutable).

apiVersion: v1
kind: Namespace
metadata:
  name: testcluster # Namespace in which we want to deploy the child cluster
spec: {}
status: {}
---
apiVersion: v1
data:
  cacert: {b64CERT}
  clouds.yaml: {b64CLOUDS}
kind: Secret
metadata:
  labels:
    clusterctl.cluster.x-k8s.io/move: "true"
  name: testcluster1-cloud-config
  namespace: testcluster
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: testcluster1 # Cluster Name
  namespace: testcluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks: [10.244.0.0/16]
    services:
      cidrBlocks: [10.96.0.0/12]
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: K0smotronControlPlane
    name: testcluster1 # Reference to the k0s control plane we are creating in the management cluster
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
    kind: OpenStackCluster
    name: testcluster1 # Reference to the HetznerCluster CRD that trigger the Hetzner Controller Manager
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: K0smotronControlPlane
metadata:
  name: testcluster1 # Cluster Name
  namespace: testcluster
spec:
  k0sVersion: v1.27.8-k0s.0 # https://github.com/k0sproject/k0s/releases
  persistence:
    type: emptyDir
  service:
    type: LoadBalancer
    apiPort: 6443
    konnectivityPort: 8132
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackCluster
metadata:
  name: testcluster1
  namespace: testcluster
spec:
  apiServerLoadBalancer:
    enabled: false
  disableAPIServerFloatingIP: true
  apiServerFixedIP: ""
  cloudName: {CLOUDNAME}
  dnsNameservers:
    - {NAMESERVERS}
  externalNetworkId: {EXTERNALNET}
  identityRef:
    kind: Secret
    name: testcluster1-cloud-config
  managedSecurityGroups: true
  nodeCidr: 10.8.0.0/20
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: testcluster1-md-0
  namespace: testcluster
spec:
  clusterName: testcluster1
  replicas: 1
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: K0sWorkerConfigTemplate
          name: testcluster1-md-0
      clusterName: testcluster1
      failureDomain: {AZ}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
        kind: OpenStackMachineTemplate
        name: testcluster1-md-0
      version: v1.27.8
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackMachineTemplate
metadata:
  name: testcluster1-md-0
  namespace: testcluster
spec:
  template:
    spec:
      cloudName: {CLOUDNAME}
      flavor: {FLAVOR}
      identityRef:
        kind: Secret
        name: testcluster1-cloud-config
      image: {IMAGE}
      sshKeyName: {KEYNAME}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: K0sWorkerConfigTemplate
metadata:
  name: testcluster1-md-0
  namespace: testcluster
spec:
  template:
    spec:
      version: v1.27.8+k0s.0

Did I understand correctly, the machine is up and running but the machine deployment never gets fully ready? If that is the case,, then setting up cloud controller in the child cluster should fix it. Unfortunately it is 100% required by CAPI itself as it will use the CCM set provider ID to maps the child cluster node object to the machine object. This is also not so we'll documented on CAPI docs.

Thanks again for your help!
Yeah I thought that this might be the issue. However I have not yet figured out how to install the openstack cloud controller manager into the child cluster.
Since k0smotron runs the control plane in a container, the CCM can not spawn on the controlplane, which is what it usually does. Should it instead just spawn on the worker nodes?
The other issue I found, was the preparation of the workers. To be able to use the CCM, the workers have to be started with certain install flags (see k0s docs here: https://docs.k0sproject.io/v1.28.4+k0s.0/cloud-providers/#enable-cloud-provider-support-in-kubelet). However I could not find any pointers how to configure that from within cluster-api (e.g. in the bootstrap config).
Is there any documentation on installing a CCM into a child cluster?

Here's an example with some other non-Openstack CCM:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: K0smotronControlPlane
metadata:
  name: autoscaler-test
  namespace: autoscaler-test
spec:
  k0sVersion: v1.27.4-k0s.0
  persistence:
    type: pvc
    persistentVolumeClaim:
      spec:
        storageClassName: hcloud-volumes
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
  service:
    type: LoadBalancer
    apiPort: 6443
    konnectivityPort: 8132
    annotations:
      load-balancer.hetzner.cloud/location: fsn1
  k0sConfig:
    apiVersion: k0s.k0sproject.io/v1beta1
    kind: ClusterConfig
    spec:
      extensions:
        helm:
          repositories:
            - name: hcloud
              url: https://charts.hetzner.cloud
          charts:
            - name: hccm
              chartName: hcloud/hcloud-cloud-controller-manager
              namespace: kube-system
              version: v1.18.0
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: K0sWorkerConfigTemplate
metadata:
  name: as-hetzner-mc
  namespace: autoscaler-test
spec:
  template:
    spec:
      version: v1.27.4+k0s.0
      args:
        - --enable-cloud-provider
        - --kubelet-extra-args="--cloud-provider=external"

@jnummelin thank you so much for your support.
With your help and the yaml examples I managed to get the CCM installed and indeed the cluster looks to be completely healthy now! This makes k0smotron a strong contender for us, thanks to your help.