equinix-labs/terraform-equinix-metal-eks-anywhere

Set default OS to bottlerocket

Closed this issue · 11 comments

There were earlier problems with deploying bottlerocket as the default OS. These issues are thought to be dependent on bare metal plans.

I'd expect we won't have problems changing over to bottlerocket now.

Hello! It's great to see you doing this. For any issues, feel free to reach out at bottlerocket-os/bottlerocket

We need Bottlerocket as the OS because Ubuntu is no-longer available:
#32

Here are the net.toml changes needed to make bottlerocket work on an m3.small.x86

          CONTENTS: |
            # Version is required, it will change as we support
            # additional settings
            version = 1

            # "eno1" is the interface name
            # Users may turn on dhcp4 and dhcp6 via boolean
            [enp1s0f0np0]
            dhcp4 = true
            dhcp6 = false
            # Define this interface as the "primary" interface
            # for the system.  This IP is what kubelet will use
            # as the node IP.  If none of the interfaces has
            # "primary" set, we choose the first interface in
            # the file
            primary = true

Key aspects are the interface name and disabling dhcp6.

Here are the bootconfig.data changes needed to make bottlerocket send console output to our SOS consoles.

          BOOTCONFIG_CONTENTS: |
            kernel {
                console = "ttyS1,115200n8"
            }

According to the documentation, this should be set in a TinkerbellTemplateConfig, like this:

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
  name: ${cluster_name}
spec:
  template:
    global_timeout: 6000
    id: ""
    name: ${cluster_name}
    tasks:
    - actions:
      - environment:
          COMPRESSED: "true"
          DEST_DISK: /dev/sda
          IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/14/artifacts/raw/1-23/bottlerocket-v1.23.7-eks-d-1-23-4-eks-a-14-amd64.img.gz
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: stream-image
        timeout: 600
      - environment:
          CONTENTS: |
            # Version is required, it will change as we support
            # additional settings
            version = 1

            # "eno1" is the interface name
            # Users may turn on dhcp4 and dhcp6 via boolean
            [enp1s0f0np0]
            dhcp4 = true
            dhcp6 = false
            # Define this interface as the "primary" interface
            # for the system.  This IP is what kubelet will use
            # as the node IP.  If none of the interfaces has
            # "primary" set, we choose the first interface in
            # the file
            primary = true
          DEST_DISK: /dev/sda12
          DEST_PATH: /net.toml
          DIRMODE: "0755"
          FS_TYPE: ext4
          GID: "0"
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: write-netplan
        pid: host
        timeout: 90
      - environment:
          BOOTCONFIG_CONTENTS: |
            kernel {
                console = "ttyS1,115200n8"
            }
            init {
                systemd.log_level=debug
            }
          DEST_DISK: /dev/sda12
          DEST_PATH: /bootconfig.data
          DIRMODE: "0700"
          FS_TYPE: ext4
          GID: "0"
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: write-bootconfig
        pid: host
        timeout: 90
      - environment:
          DEST_DISK: /dev/sda12
          DEST_PATH: /user-data.toml
          DIRMODE: "0700"
          FS_TYPE: ext4
          GID: "0"
          HEGEL_URLS: http://${pool_admin}:50061,http://${tink_vip}:50061
          MODE: "0644"
          UID: "0"
        image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: write-user-data
        pid: host
        timeout: 90
      - image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
        name: reboot-image
        pid: host
        timeout: 90
        volumes:
        - /worker:/worker
    version: "0.1"

To use this TinkerbellTemplateConfig, you need to modify the generated my-eksa-cluster.yaml file to reference it in the TinkerbellMachineConfig sections, like this:

kind: TinkerbellMachineConfig
metadata:
  name: my-eksa-cluster-cp
spec:
  hardwareSelector:
    type: cp
  osFamily: bottlerocket
  templateRef:
    kind: TinkerbellTemplateConfig
    name: my-eksa-cluster
  users:

Unfortunately, it seems the current version of EKS-A doesn't respect these overrides. I plan to open a bug on aws/eks-anywhere after validating this is still broken in 0.11.1.

Was missing this stuff from the templateconfig file:

      name: my-eksa-cluster
      volumes:
        - /dev:/dev
        - /dev/console:/dev/console
        - /lib/firmware:/lib/firmware:ro
      worker: '{{.device_1}}'

Fixed by #31

There are new concerns that we'll want to express as issues expressed in this comment:
#31 (comment)