Set default OS to bottlerocket
Closed this issue · 11 comments
There were earlier problems with deploying bottlerocket as the default OS. These issues are thought to be dependent on bare metal plans.
I'd expect we won't have problems changing over to bottlerocket now.
Hello! It's great to see you doing this. For any issues, feel free to reach out at bottlerocket-os/bottlerocket
Here are the net.toml changes needed to make bottlerocket work on an m3.small.x86
CONTENTS: |
# Version is required, it will change as we support
# additional settings
version = 1
# "eno1" is the interface name
# Users may turn on dhcp4 and dhcp6 via boolean
[enp1s0f0np0]
dhcp4 = true
dhcp6 = false
# Define this interface as the "primary" interface
# for the system. This IP is what kubelet will use
# as the node IP. If none of the interfaces has
# "primary" set, we choose the first interface in
# the file
primary = true
Key aspects are the interface name and disabling dhcp6.
Here are the bootconfig.data changes needed to make bottlerocket send console output to our SOS consoles.
BOOTCONFIG_CONTENTS: |
kernel {
console = "ttyS1,115200n8"
}
According to the documentation, this should be set in a TinkerbellTemplateConfig, like this:
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
name: ${cluster_name}
spec:
template:
global_timeout: 6000
id: ""
name: ${cluster_name}
tasks:
- actions:
- environment:
COMPRESSED: "true"
DEST_DISK: /dev/sda
IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/14/artifacts/raw/1-23/bottlerocket-v1.23.7-eks-d-1-23-4-eks-a-14-amd64.img.gz
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
name: stream-image
timeout: 600
- environment:
CONTENTS: |
# Version is required, it will change as we support
# additional settings
version = 1
# "eno1" is the interface name
# Users may turn on dhcp4 and dhcp6 via boolean
[enp1s0f0np0]
dhcp4 = true
dhcp6 = false
# Define this interface as the "primary" interface
# for the system. This IP is what kubelet will use
# as the node IP. If none of the interfaces has
# "primary" set, we choose the first interface in
# the file
primary = true
DEST_DISK: /dev/sda12
DEST_PATH: /net.toml
DIRMODE: "0755"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
name: write-netplan
pid: host
timeout: 90
- environment:
BOOTCONFIG_CONTENTS: |
kernel {
console = "ttyS1,115200n8"
}
init {
systemd.log_level=debug
}
DEST_DISK: /dev/sda12
DEST_PATH: /bootconfig.data
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
name: write-bootconfig
pid: host
timeout: 90
- environment:
DEST_DISK: /dev/sda12
DEST_PATH: /user-data.toml
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
HEGEL_URLS: http://${pool_admin}:50061,http://${tink_vip}:50061
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
name: write-user-data
pid: host
timeout: 90
- image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-14
name: reboot-image
pid: host
timeout: 90
volumes:
- /worker:/worker
version: "0.1"
To use this TinkerbellTemplateConfig, you need to modify the generated my-eksa-cluster.yaml file to reference it in the TinkerbellMachineConfig sections, like this:
kind: TinkerbellMachineConfig
metadata:
name: my-eksa-cluster-cp
spec:
hardwareSelector:
type: cp
osFamily: bottlerocket
templateRef:
kind: TinkerbellTemplateConfig
name: my-eksa-cluster
users:
Unfortunately, it seems the current version of EKS-A doesn't respect these overrides. I plan to open a bug on aws/eks-anywhere after validating this is still broken in 0.11.1.
/cc @stockholmux
Was missing this stuff from the templateconfig file:
name: my-eksa-cluster
volumes:
- /dev:/dev
- /dev/console:/dev/console
- /lib/firmware:/lib/firmware:ro
worker: '{{.device_1}}'
Fixed by #31
There are new concerns that we'll want to express as issues expressed in this comment:
#31 (comment)