aws/eks-anywhere

Custom TinkerbellTemplateConfig fails to work

Closed this issue · 5 comments

What happened: When using a custom TinkerbellTemplateConfig, the resulting tinkerbell workflow is empty and cluster fails to provision.

What you expected to happen: For the clusters to provision.

How to reproduce it (as minimally and precisely as possible):

Here's my cluster.yaml:

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: my-eksa-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
        - 192.168.0.0/16
    services:
      cidrBlocks:
        - 10.96.0.0/12
  controlPlaneConfiguration:
    count: 1
    endpoint:
      host: "147.75.202.254"
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: my-eksa-cluster-cp
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: my-eksa-cluster
  kubernetesVersion: "1.23"
  managementCluster:
    name: my-eksa-cluster
  workerNodeGroupConfigurations:
    - count: 1
      machineGroupRef:
        kind: TinkerbellMachineConfig
        name: my-eksa-cluster
      name: md-0
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: my-eksa-cluster
spec:
  tinkerbellIP: "147.75.202.253"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: my-eksa-cluster-cp
spec:
  hardwareSelector:
    type: cp
  osFamily: bottlerocket
  templateRef:
    kind: TinkerbellTemplateConfig
    name: my-eksa-cluster
  users:
    - name: ec2-user
      sshAuthorizedKeys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC1Zweba/X6qrXQ6ubIkZHq1yFF9VRlMUiK457vtuI0Psdg73OLJmh67XmhZ6QkRQjToLYZ5PzppL4QVVPceyA5OHkh8E8HHg3JsZTynXo7YoneI7PQP6DIPjd3z4T28zox6gNNsVpkoeMPmCxJJg5y+9vz8PbEHsFUX9MmWYLWCgltXT+Cr/hudNNxZB4nD2EhNffrRsLlmxf/Cl8fH4xHBSB3W+AKit9cdIXM2SRxUQ2drq2HTiPuFv75+8t4ZvX4j/szV0Z9TguLR2vILzhv/K7FD1LMeGOS/fi5YdIoKy2/46j3ooeuP9OUUkFK5y1Q1dhbdtZWIn6ImmPkjEAAsWl4c4ApvycgkDlMqdKegspmYjtaf1yACacS4tAuZyhuNObMiX0SfwEisiNm8QgOfZsVBwrvAAL2qRosmDMKk5rQpKMsn7yXhbSwtEmFdnODymAxrKezy54C9H0xwDE0YER3FFf56/RzEaQ5Lfyh03kZcOdSe5nIGz4FlSWJ79S9VuS5nxx3kTgHOPa1G1D3MTps4bVUCcR4rJOqHQTPDIG+Xk5Zr377oG0VMQE664KbrcdJ7jujUpxV/Krnm7z/lzl9EkecNHYg8W83XVNqoIA5oZ5R0OmqceQkjOcunCOqQOOSxVoHGS0nreMine0HoVYCJg6vcws4Qc1qCiPiJQ==
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: my-eksa-cluster
spec:
  hardwareSelector:
    type: dp
  osFamily: bottlerocket
  templateRef:
    kind: TinkerbellTemplateConfig
    name: my-eksa-cluster
  users:
    - name: ec2-user
      sshAuthorizedKeys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC1Zweba/X6qrXQ6ubIkZHq1yFF9VRlMUiK457vtuI0Psdg73OLJmh67XmhZ6QkRQjToLYZ5PzppL4QVVPceyA5OHkh8E8HHg3JsZTynXo7YoneI7PQP6DIPjd3z4T28zox6gNNsVpkoeMPmCxJJg5y+9vz8PbEHsFUX9MmWYLWCgltXT+Cr/hudNNxZB4nD2EhNffrRsLlmxf/Cl8fH4xHBSB3W+AKit9cdIXM2SRxUQ2drq2HTiPuFv75+8t4ZvX4j/szV0Z9TguLR2vILzhv/K7FD1LMeGOS/fi5YdIoKy2/46j3ooeuP9OUUkFK5y1Q1dhbdtZWIn6ImmPkjEAAsWl4c4ApvycgkDlMqdKegspmYjtaf1yACacS4tAuZyhuNObMiX0SfwEisiNm8QgOfZsVBwrvAAL2qRosmDMKk5rQpKMsn7yXhbSwtEmFdnODymAxrKezy54C9H0xwDE0YER3FFf56/RzEaQ5Lfyh03kZcOdSe5nIGz4FlSWJ79S9VuS5nxx3kTgHOPa1G1D3MTps4bVUCcR4rJOqHQTPDIG+Xk5Zr377oG0VMQE664KbrcdJ7jujUpxV/Krnm7z/lzl9EkecNHYg8W83XVNqoIA5oZ5R0OmqceQkjOcunCOqQOOSxVoHGS0nreMine0HoVYCJg6vcws4Qc1qCiPiJQ==
---
{}
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
  name: my-eksa-cluster
spec:
  template:
    global_timeout: 6000
    id: ""
    name: my-eksa-cluster
    tasks:
    - actions:
      - environment:
          COMPRESSED: "true"
          DEST_DISK: /dev/sda
          IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/14/artifacts/raw/1-23/bottlerocket-v1.23.7-eks-d-1-23-4-eks-a-14-amd64.img.gz
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: stream-image
        timeout: 600
      - environment:
          CONTENTS: |
            # Version is required, it will change as we support
            # additional settings
            version = 1

            # "eno1" is the interface name
            # Users may turn on dhcp4 and dhcp6 via boolean
            [enp1s0f0np0]
            dhcp4 = true
            dhcp6 = false
            # Define this interface as the "primary" interface
            # for the system.  This IP is what kubelet will use
            # as the node IP.  If none of the interfaces has
            # "primary" set, we choose the first interface in
            # the file
            primary = true
          DEST_DISK: /dev/sda12
          DEST_PATH: /net.toml
          DIRMODE: "0755"
          FS_TYPE: ext4
          GID: "0"
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: write-netplan
        pid: host
        timeout: 90
      - environment:
          BOOTCONFIG_CONTENTS: |
            kernel {
                console = "ttyS1,115200n8"
            }
            init {
                systemd.log_level=debug
            }
          DEST_DISK: /dev/sda12
          DEST_PATH: /bootconfig.data
          DIRMODE: "0700"
          FS_TYPE: ext4
          GID: "0"
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: write-bootconfig
        pid: host
        timeout: 90
      - environment:
          DEST_DISK: /dev/sda12
          DEST_PATH: /user-data.toml
          DIRMODE: "0700"
          FS_TYPE: ext4
          GID: "0"
          HEGEL_URLS: http://147.75.202.242:50061,http://147.75.202.253:50061
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: write-user-data
        pid: host
        timeout: 90
      - image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: reboot-image
        pid: host
        timeout: 90
        volumes:
        - /worker:/worker
    version: "0.1"

After using this with eksctl command eksctl anywhere create cluster --filename my-eksa-cluster.yaml --hardware-csv hardware.csv --tinkerbell-bootstrap-ip 147.75.202.242 the output looks good:

Warning: The recommended number of control plane nodes is 3 or 5
Warning: The recommended number of control plane nodes is 3 or 5
Performing setup and validations
✅ Tinkerbell Provider setup is valid
✅ Validate certificate for registry mirror
✅ Validate authentication for git provider
✅ Create preflight validations pass
Creating new bootstrap cluster
Provider specific pre-capi-install-setup on bootstrap cluster
Installing cluster-api providers on bootstrap cluster
Provider specific post-setup
Creating new workload cluster

However, it never gets past the creating new workload cluster step. The workload machines boot into linuxkit, but just stay there.

tink-controller logs are more illuminating:

root@eksa-admin:~# kubectl -n eksa-system logs tink-server-69bb8bc84-fnhfs
{"level":"info","ts":1661524365.8658762,"caller":"tink-server/main.go:249","msg":"no config file found","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661524365.8661466,"caller":"metrics/metrics.go:58","msg":"initializing label values","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661524365.8665445,"caller":"tink-server/main.go:130","msg":"starting version 8011b72","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661524365.9585457,"logger":"fallback","caller":"manager/internal.go:362","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:41147"}
{"level":"info","ts":1661524365.9585578,"logger":"fallback","caller":"manager/internal.go:362","msg":"Starting server","kind":"health probe","addr":"[::]:43409"}
{"level":"info","ts":1661524365.9593925,"caller":"tink-server/main.go:207","msg":"started listener","service":"github.com/tinkerbell/tink","address":"[::]:42113"}
{"level":"info","ts":1661524365.9595804,"caller":"http-server/http_server.go:31","msg":"serving http","service":"github.com/tinkerbell/tink"}
root@eksa-admin:~# kubectl -n eksa-system logs tink-controller-manager-7cbf8c4d66-pf68b
{"level":"info","ts":1661524367.067631,"caller":"tink-controller/main.go:106","msg":"no config file found","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661524367.0677176,"caller":"tink-controller/main.go:60","msg":"starting controller version 8011b72","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661524367.1584876,"logger":"fallback","caller":"manager/internal.go:362","msg":"Starting server","kind":"health probe","addr":"[::]:46545"}
{"level":"info","ts":1661524367.158503,"logger":"fallback","caller":"manager/internal.go:362","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:43017"}
I0826 14:32:47.259668       1 leaderelection.go:248] attempting to acquire leader lease eksa-system/tink-leader-election...
I0826 14:32:47.271599       1 leaderelection.go:258] successfully acquired lease eksa-system/tink-leader-election
{"level":"info","ts":1661524367.271899,"logger":"fallback.controller.workflow","caller":"controller/controller.go:178","msg":"Starting EventSource","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","source":"kind source: *v1alpha1.Workflow"}
{"level":"info","ts":1661524367.2719789,"logger":"fallback.controller.workflow","caller":"controller/controller.go:186","msg":"Starting Controller","reconciler group":"tinkerbell.org","reconciler kind":"Workflow"}
{"level":"info","ts":1661524367.2720811,"logger":"fallback.controller.workflow","caller":"controller/controller.go:220","msg":"Starting workers","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","worker count":1}
{"level":"info","ts":1661524405.3835952,"logger":"fallback.controller.workflow","caller":"workflow/controller.go:40","msg":"Reconciling","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","name":"my-eksa-cluster-control-plane-template-1661524403587-6lcgk","namespace":"eksa-system"}
{"level":"error","ts":1661524405.4845738,"logger":"fallback.controller.workflow","caller":"controller/controller.go:317","msg":"Reconciler error","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","name":"my-eksa-cluster-control-plane-template-1661524403587-6lcgk","namespace":"eksa-system","error":"validating workflow template: name cannot be empty","errorVerbose":"name cannot be empty\ngithub.com/tinkerbell/tink/workflow.validate\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:127\ngithub.com/tinkerbell/tink/workflow.Parse\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:36\ngithub.com/tinkerbell/tink/workflow.RenderTemplateHardware\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:95\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).processNewWorkflow\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:90\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).Reconcile\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\truntime/asm_amd64.s:1581\nvalidating workflow template\ngithub.com/tinkerbell/tink/workflow.Parse\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:37\ngithub.com/tinkerbell/tink/workflow.RenderTemplateHardware\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:95\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).processNewWorkflow\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:90\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).Reconcile\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\truntime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:317\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1661524405.49004,"logger":"fallback.controller.workflow","caller":"workflow/controller.go:40","msg":"Reconciling","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","name":"my-eksa-cluster-control-plane-template-1661524403587-6lcgk","namespace":"eksa-system"}
{"level":"error","ts":1661524405.4907446,"logger":"fallback.controller.workflow","caller":"controller/controller.go:317","msg":"Reconciler error","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","name":"my-eksa-cluster-control-plane-template-1661524403587-6lcgk","namespace":"eksa-system","error":"validating workflow template: name cannot be empty","errorVerbose":"name cannot be empty\ngithub.com/tinkerbell/tink/workflow.validate\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:127\ngithub.com/tinkerbell/tink/workflow.Parse\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:36\ngithub.com/tinkerbell/tink/workflow.RenderTemplateHardware\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:95\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).processNewWorkflow\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:90\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).Reconcile\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\truntime/asm_amd64.s:1581\nvalidating workflow template\ngithub.com/tinkerbell/tink/workflow.Parse\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:37\ngithub.com/tinkerbell/tink/workflow.RenderTemplateHardware\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:95\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).processNewWorkflow\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:90\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).Reconcile\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\truntime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:317\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227"}

This repeats until timeout.

So if we check on some of the cluster objects we can see this:

root@eksa-admin:~# kubectl -n eksa-system get ma
NAME                                    CLUSTER           NODENAME   PROVIDERID   PHASE          AGE     VERSION
my-eksa-cluster-md-0-798f5b8594-ntqnb   my-eksa-cluster                           Pending        6m22s   v1.23.7-eks-1-23-4
my-eksa-cluster-p5wgl                   my-eksa-cluster                           Provisioning   6m22s   v1.23.7-eks-1-23-4
root@eksa-admin:~# kubectl -n eksa-system get tinkerbellmachine
NAME                                                         CLUSTER           STATE   READY   INSTANCEID                                  MACHINE
my-eksa-cluster-control-plane-template-1661524403587-6lcgk   my-eksa-cluster                   tinkerbell://eksa-system/eksa-node-cp-001   my-eksa-cluster-p5wgl
my-eksa-cluster-md-0-1661524403588-vgzm6                     my-eksa-cluster                                                               my-eksa-cluster-md-0-798f5b8594-ntqnb
root@eksa-admin:~# kubectl -n eksa-system get tpl
NAME                                                         STATE
my-eksa-cluster-control-plane-template-1661524403587-6lcgk
root@eksa-admin:~# kubectl -n eksa-system get hw
NAME               STATE
eksa-node-cp-001
eksa-node-dp-001
root@eksa-admin:~# kubectl -n eksa-system get wf
NAME                                                         TEMPLATE                                                     STATE
my-eksa-cluster-control-plane-template-1661524403587-6lcgk   my-eksa-cluster-control-plane-template-1661524403587-6lcgk

The empty state fields are concerning, so let's check out the workflow:

root@eksa-admin:~# kubectl -n eksa-system describe wf
Name:         my-eksa-cluster-control-plane-template-1661524403587-6lcgk
Namespace:    eksa-system
Labels:       <none>
Annotations:  <none>
API Version:  tinkerbell.org/v1alpha1
Kind:         Workflow
Metadata:
  Creation Timestamp:  2022-08-26T14:33:25Z
  Generation:          1
  Managed Fields:
    API Version:  tinkerbell.org/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"04162f45-d584-4d8f-a208-ba3bf4a31c3f"}:
      f:spec:
        .:
        f:hardwareMap:
          .:
          f:device_1:
        f:templateRef:
    Manager:    manager
    Operation:  Update
    Time:       2022-08-26T14:33:25Z
  Owner References:
    API Version:     infrastructure.cluster.x-k8s.io/v1beta1
    Controller:      true
    Kind:            TinkerbellMachine
    Name:            my-eksa-cluster-control-plane-template-1661524403587-6lcgk
    UID:             04162f45-d584-4d8f-a208-ba3bf4a31c3f
  Resource Version:  1655
  UID:               109d9e75-6c10-43d9-b82c-d6d7ec938c90
Spec:
  Hardware Map:
    device_1:    10:70:fd:7f:99:a2
  Template Ref:  my-eksa-cluster-control-plane-template-1661524403587-6lcgk
Events:          <none>

And its template:

root@eksa-admin:~# kubectl -n eksa-system describe tpl
Name:         my-eksa-cluster-control-plane-template-1661524403587-6lcgk
Namespace:    eksa-system
Labels:       <none>
Annotations:  <none>
API Version:  tinkerbell.org/v1alpha1
Kind:         Template
Metadata:
  Creation Timestamp:  2022-08-26T14:33:25Z
  Generation:          1
  Managed Fields:
    API Version:  tinkerbell.org/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"04162f45-d584-4d8f-a208-ba3bf4a31c3f"}:
      f:spec:
        .:
        f:data:
    Manager:    manager
    Operation:  Update
    Time:       2022-08-26T14:33:25Z
  Owner References:
    API Version:     infrastructure.cluster.x-k8s.io/v1beta1
    Kind:            TinkerbellMachine
    Name:            my-eksa-cluster-control-plane-template-1661524403587-6lcgk
    UID:             04162f45-d584-4d8f-a208-ba3bf4a31c3f
  Resource Version:  1654
  UID:               1c932eba-9c10-464e-bf37-125d4bb13181
Spec:
  Data:  global_timeout: 6000
id: ""
name: my-eksa-cluster
tasks:
- actions:
  - environment:
      COMPRESSED: "true"
      DEST_DISK: /dev/sda
      IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/14/artifacts/raw/1-23/bottlerocket-v1.23.7-eks-d-1-23-4-eks-a-14-amd64.img.gz
    image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
    name: stream-image
    timeout: 600
  - environment:
      CONTENTS: |
        # Version is required, it will change as we support
        # additional settings
        version = 1

        # "eno1" is the interface name
        # Users may turn on dhcp4 and dhcp6 via boolean
        [enp1s0f0np0]
        dhcp4 = true
        dhcp6 = false
        # Define this interface as the "primary" interface
        # for the system.  This IP is what kubelet will use
        # as the node IP.  If none of the interfaces has
        # "primary" set, we choose the first interface in
        # the file
        primary = true
      DEST_DISK: /dev/sda12
      DEST_PATH: /net.toml
      DIRMODE: "0755"
      FS_TYPE: ext4
      GID: "0"
      MODE: "0644"
      UID: "0"
    image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
    name: write-netplan
    pid: host
    timeout: 90
  - environment:
      BOOTCONFIG_CONTENTS: |
        kernel {
            console = "ttyS1,115200n8"
        }
        init {
            systemd.log_level=debug
        }
      DEST_DISK: /dev/sda12
      DEST_PATH: /bootconfig.data
      DIRMODE: "0700"
      FS_TYPE: ext4
      GID: "0"
      MODE: "0644"
      UID: "0"
    image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
    name: write-bootconfig
    pid: host
    timeout: 90
  - environment:
      DEST_DISK: /dev/sda12
      DEST_PATH: /user-data.toml
      DIRMODE: "0700"
      FS_TYPE: ext4
      GID: "0"
      HEGEL_URLS: http://147.75.202.242:50061,http://147.75.202.253:50061
      MODE: "0644"
      UID: "0"
    image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
    name: write-user-data
    pid: host
    timeout: 90
  - image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
    name: reboot-image
    pid: host
    timeout: 90
    volumes:
    - /worker:/worker
  name: ""
  worker: ""
version: "0.1"

Events:  <none>

Anything else we need to know?: What else can I gather for you?

Environment:

  • EKS Anywhere Release: 0.11.1
  • Equinix Metal

Hey @cprivitere , thanks for reporting this. Would you mind modifying your TinkerbellTemplateConfig? Let me know if that helps, thanks!

diff --git a/current.yaml b/suggested.yaml
index 680348f..126877c 100644
--- a/current.yaml
+++ b/suggested.yaml
@@ -82,4 +82,5 @@ spec:
         timeout: 90
         volumes:
         - /worker:/worker
+      worker: '{{.device_1}}'
     version: "0.1"

Sure, added that but it made no difference. Here's the new output from tink controller.

root@eksa-admin:~# kubectl -n eksa-system logs tink-controller-manager-7cbf8c4d66-5bcsl
{"level":"info","ts":1661538534.7211745,"caller":"tink-controller/main.go:106","msg":"no config file found","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661538534.721232,"caller":"tink-controller/main.go:60","msg":"starting controller version 8011b72","service":"github.com/tinkerbell/tink"}
{"level":"info","ts":1661538534.811302,"logger":"fallback","caller":"manager/internal.go:362","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:42173"}
{"level":"info","ts":1661538534.811356,"logger":"fallback","caller":"manager/internal.go:362","msg":"Starting server","kind":"health probe","addr":"[::]:38793"}
I0826 18:28:54.911702       1 leaderelection.go:248] attempting to acquire leader lease eksa-system/tink-leader-election...
I0826 18:28:54.924157       1 leaderelection.go:258] successfully acquired lease eksa-system/tink-leader-election
{"level":"info","ts":1661538534.924403,"logger":"fallback.controller.workflow","caller":"controller/controller.go:178","msg":"Starting EventSource","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","source":"kind source: *v1alpha1.Workflow"}
{"level":"info","ts":1661538534.9245145,"logger":"fallback.controller.workflow","caller":"controller/controller.go:186","msg":"Starting Controller","reconciler group":"tinkerbell.org","reconciler kind":"Workflow"}
{"level":"info","ts":1661538534.924631,"logger":"fallback.controller.workflow","caller":"controller/controller.go:220","msg":"Starting workers","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","worker count":1}
{"level":"info","ts":1661538581.1293182,"logger":"fallback.controller.workflow","caller":"workflow/controller.go:40","msg":"Reconciling","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","name":"my-eksa-cluster-control-plane-template-1661538579174-bt6n6","namespace":"eksa-system"}
{"level":"error","ts":1661538581.2307472,"logger":"fallback.controller.workflow","caller":"controller/controller.go:317","msg":"Reconciler error","reconciler group":"tinkerbell.org","reconciler kind":"Workflow","name":"my-eksa-cluster-control-plane-template-1661538579174-bt6n6","namespace":"eksa-system","error":"validating workflow template: name cannot be empty","errorVerbose":"name cannot be empty\ngithub.com/tinkerbell/tink/workflow.validate\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:127\ngithub.com/tinkerbell/tink/workflow.Parse\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:36\ngithub.com/tinkerbell/tink/workflow.RenderTemplateHardware\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:95\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).processNewWorkflow\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:90\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).Reconcile\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\truntime/asm_amd64.s:1581\nvalidating workflow template\ngithub.com/tinkerbell/tink/workflow.Parse\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:37\ngithub.com/tinkerbell/tink/workflow.RenderTemplateHardware\n\tgithub.com/tinkerbell/tink/workflow/template_validator.go:95\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).processNewWorkflow\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:90\ngithub.com/tinkerbell/tink/pkg/controllers/workflow.(*Controller).Reconcile\n\tgithub.com/tinkerbell/tink/pkg/controllers/workflow/controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\truntime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:317\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227"}

Here's the new my-eksa-cluster.yaml

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: my-eksa-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
        - 192.168.0.0/16
    services:
      cidrBlocks:
        - 10.96.0.0/12
  controlPlaneConfiguration:
    count: 1
    endpoint:
      host: "147.75.202.254"
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: my-eksa-cluster-cp
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: my-eksa-cluster
  kubernetesVersion: "1.23"
  managementCluster:
    name: my-eksa-cluster
  workerNodeGroupConfigurations:
    - count: 1
      machineGroupRef:
        kind: TinkerbellMachineConfig
        name: my-eksa-cluster
      name: md-0
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: my-eksa-cluster
spec:
  tinkerbellIP: "147.75.202.253"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: my-eksa-cluster-cp
spec:
  hardwareSelector:
    type: cp
  osFamily: bottlerocket
  templateRef:
    kind: TinkerbellTemplateConfig
    name: my-eksa-cluster
  users:
    - name: ec2-user
      sshAuthorizedKeys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDCwAFhyyBY/dLK7fq2redJswiJ/ecViaTMqnJYeSX+tWZ3qJFWWkZPDEXQ1Hpf6vguIhj7SRiWdxMtxifBvJyTRdXLdQQ+ueGqJerYbx6qKxvKvZ8ytqjX7dvLv34VbZeM3fOOTNYAnpb7sW2jJ384DKaQ8AQpqIJn+t8PkqKxNMyY8nbbg7X0SZWeGgFg+z8BibxurRWsv7ZD6ujlj4LuXPV8wL0K21HKDLkiBvgj6IdArL6vwSbXKe0VQByWkwVCVQcP16UVcbGPEKiI6/NYDqvb7931goch3Et7qJHrcg0y2YLChzyZujlCCvFPCK3XJpn7lhIJiiJEGAYe08/ANx6lYCiVk5CXpvBNLdioEQWefaXq6ohlknU5d7ZUB7VkRvk62D8T/Hl27ml+Y0ElmZcD2vTOJ1EZFkmJmHpVu1r7uj6wTOOZCPmGkbB+H1fDiX/BCmaKCW41ePr1SUz6v4NnirCZd+zFUcmpBObQcwgHXcKlp7tqdgF7b6ySQEbcRQAIuLUrd/KoZJm8f/UpAzL8jDwctZDZr1Z4NGqT3ZhViF79Tuo2gNg19qL8EGVMMAKj3U5gvXGdf00JKHOtNieiTBmVcnaw2w0+7Vt1nVaTy2v2cnxMr688dqta5Bv8VFFuRVoUVT4sir45OzwMUQAAydK9BTQ1FFNBaCpZXQ==
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: my-eksa-cluster
spec:
  hardwareSelector:
    type: dp
  osFamily: bottlerocket
  templateRef:
    kind: TinkerbellTemplateConfig
    name: my-eksa-cluster
  users:
    - name: ec2-user
      sshAuthorizedKeys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDCwAFhyyBY/dLK7fq2redJswiJ/ecViaTMqnJYeSX+tWZ3qJFWWkZPDEXQ1Hpf6vguIhj7SRiWdxMtxifBvJyTRdXLdQQ+ueGqJerYbx6qKxvKvZ8ytqjX7dvLv34VbZeM3fOOTNYAnpb7sW2jJ384DKaQ8AQpqIJn+t8PkqKxNMyY8nbbg7X0SZWeGgFg+z8BibxurRWsv7ZD6ujlj4LuXPV8wL0K21HKDLkiBvgj6IdArL6vwSbXKe0VQByWkwVCVQcP16UVcbGPEKiI6/NYDqvb7931goch3Et7qJHrcg0y2YLChzyZujlCCvFPCK3XJpn7lhIJiiJEGAYe08/ANx6lYCiVk5CXpvBNLdioEQWefaXq6ohlknU5d7ZUB7VkRvk62D8T/Hl27ml+Y0ElmZcD2vTOJ1EZFkmJmHpVu1r7uj6wTOOZCPmGkbB+H1fDiX/BCmaKCW41ePr1SUz6v4NnirCZd+zFUcmpBObQcwgHXcKlp7tqdgF7b6ySQEbcRQAIuLUrd/KoZJm8f/UpAzL8jDwctZDZr1Z4NGqT3ZhViF79Tuo2gNg19qL8EGVMMAKj3U5gvXGdf00JKHOtNieiTBmVcnaw2w0+7Vt1nVaTy2v2cnxMr688dqta5Bv8VFFuRVoUVT4sir45OzwMUQAAydK9BTQ1FFNBaCpZXQ==
---
{}
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
  name: my-eksa-cluster
spec:
  template:
    global_timeout: 6000
    id: ""
    name: my-eksa-cluster
    tasks:
    - actions:
      - environment:
          COMPRESSED: "true"
          DEST_DISK: /dev/sda
          IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/15/artifacts/raw/1-23/bottlerocket-v1.23.7-eks-d-1-23-4-eks-a-15-amd64.img.gz
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-15
        name: stream-image
        timeout: 600
      - environment:
          CONTENTS: |
            # Version is required, it will change as we support
            # additional settings
            version = 1

            # "eno1" is the interface name
            # Users may turn on dhcp4 and dhcp6 via boolean
            [enp1s0f0np0]
            dhcp4 = true
            dhcp6 = false
            # Define this interface as the "primary" interface
            # for the system.  This IP is what kubelet will use
            # as the node IP.  If none of the interfaces has
            # "primary" set, we choose the first interface in
            # the file
            primary = true
          DEST_DISK: /dev/sda12
          DEST_PATH: /net.toml
          DIRMODE: "0755"
          FS_TYPE: ext4
          GID: "0"
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-15
        name: write-netplan
        pid: host
        timeout: 90
      - environment:
          BOOTCONFIG_CONTENTS: |
            kernel {
                console = "ttyS1,115200n8"
            }
            init {
                systemd.log_level=debug
            }
          DEST_DISK: /dev/sda12
          DEST_PATH: /bootconfig.data
          DIRMODE: "0700"
          FS_TYPE: ext4
          GID: "0"
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-15
        name: write-bootconfig
        pid: host
        timeout: 90
      - environment:
          DEST_DISK: /dev/sda12
          DEST_PATH: /user-data.toml
          DIRMODE: "0700"
          FS_TYPE: ext4
          GID: "0"
          HEGEL_URLS: http://147.75.202.242:50061,http://147.75.202.253:50061
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-15
        name: write-user-data
        pid: host
        timeout: 90
      - image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-15
        name: reboot-image
        pid: host
        timeout: 90
        volumes:
        - /worker:/worker
      worker: '{{.device_1}}'
    version: "0.1"

Ok, I gave it a go adding this full block at the end that I saw in the testdata in the project:

      name: my-eksa-cluster
      volumes:
        - /dev:/dev
        - /dev/console:/dev/console
        - /lib/firmware:/lib/firmware:ro
      worker: '{{.device_1}}'

Cluster creates successfully now.

root@eksa-admin:~# kubectl get nodes
NAME             STATUS   ROLES                  AGE     VERSION
147.75.202.243   Ready    control-plane,master   5m16s   v1.23.7-eks-7709a84
147.75.202.244   Ready    <none>                 2m32s   v1.23.7-eks-7709a84

Given the error messages from tink-controller, it seems the name: field for sure was the thing it wanted. I don't know about which of the others are required or not. Either way, this documentation needs to be updated with the correct values: https://anywhere.eks.amazonaws.com/docs/reference/clusterspec/baremetal/#advanced-bare-metal-cluster-configuration

Additionally, those docs have an issue where they have old eks image references baked into them, the docs will either need a way for users to generate a default template file to use when eks-a gets updated, or the docs themselves will need to be updated every time the images are updated. Or some better system of templating that ignores the image versions will need to be created.

Ah nice. Glad to hear you got it working! I will open a PR for the docs. Thanks for working through this!

PR opened. #3184