awslabs/amazon-eks-ami

Unable to optimize a CIS hardened AMI

Closed this issue · 4 comments

What happened:

I am trying to build a EKS optimized AMI using a base CIS hardened AMI with AL2023 OS. My build succeeds however when we launch an EC2 instance its not able to boot up. It seems Network is broken. Kindly advice on how to optimize the CIS hardened AMI. I don't encounter this issue with AL2. Its just with AL2023.

Below command I am using to build the AMI. And project tag used is v20240807

make k8s=1.30 os_distro=al2023

I can share some logs output which I collected by attaching the root volume to another EC2

Errors logged in audit log

type=AVC msg=audit(1726760765.652:186): avc:  denied  { read } for  pid=1802 comm="chronyd" name="dhcp.sources" dev="tmpfs" ino=1088 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=file permissive=1
type=AVC msg=audit(1726760765.652:187): avc:  denied  { open } for  pid=1802 comm="chronyd" path="/run/chrony.d/dhcp.sources" dev="tmpfs" ino=1088 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=file permissive=1
type=AVC msg=audit(1726760765.652:188): avc:  denied  { getattr } for  pid=1802 comm="chronyd" path="/run/chrony.d/dhcp.sources" dev="tmpfs" ino=1088 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=file permissive=1

Errors in journalctl

Sep 19 16:02:32 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System Message Bus.
Sep 19 16:04:02 localhost systemd[1]: Failed to start systemd-logind.service - User Login Management.
Sep 19 16:04:02 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System Message Bus.
Sep 19 16:04:32 localhost systemd-networkd-wait-online[1740]: Timeout occurred while waiting for network connectivity.
Sep 19 16:04:32 localhost systemd[1]: Failed to start systemd-networkd-wait-online.service - Wait for Network to be Configured.
Sep 19 16:05:32 localhost systemd[1]: Failed to start systemd-logind.service - User Login Management.
Sep 19 16:05:32 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System Message Bus.
Sep 19 16:06:58 localhost systemd[1]: Failed to start cloud-init.service - Initial cloud-init job (metadata service crawler).
Sep 19 16:06:58 localhost busctl[1943]: Failed to connect to bus: Connection refused
Sep 19 16:06:59 localhost chronyd[1987]: Could not add source 169.254.169.123
Sep 19 16:06:59 localhost systemd[1]: Failed to start hibinit-agent.service - Initial hibernation setup job.
Sep 19 16:07:02 localhost systemd[1]: Failed to start systemd-logind.service - User Login Management.
Sep 19 16:07:03 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System Message Bus.
Sep 19 16:08:32 localhost systemd[1]: Failed to start systemd-logind.service - User Login Management.
Sep 19 16:08:33 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System Message Bus.
Sep 19 16:10:03 localhost systemd[1]: Failed to start systemd-logind.service - User Login Management.
Sep 19 16:10:03 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System 

Warning logs in journal

Sep 19 16:02:16 localhost kernel: Speculative Return Stack Overflow: IBPB-extending microcode not applied!
Sep 19 16:02:16 localhost kernel: Speculative Return Stack Overflow: WARNING: See https://kernel.org/doc/html/latest/admin-guide/hw-vuln/srso.html for mitigation options.
Sep 19 16:02:16 localhost kernel: GPT:Primary header thinks Alt. header is not at the end of the disk.
Sep 19 16:02:16 localhost kernel: GPT:41943039 != 83886079
Sep 19 16:02:16 localhost kernel: GPT:Alternate GPT header not at the end of the disk.
Sep 19 16:02:16 localhost kernel: GPT:41943039 != 83886079
Sep 19 16:02:16 localhost kernel: GPT: Use GNU Parted to correct GPT errors.
Sep 19 16:02:27 localhost kernel: kauditd_printk_skb: 17 callbacks suppressed
Sep 19 16:02:27 localhost systemd[1]: /usr/lib/systemd/system/update-motd.service:40: Invalid CPU quota '25', ignoring.
Sep 19 16:02:27 localhost systemd[1]: Configuration file /etc/systemd/system/configure-clocksource.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Sep 19 16:02:27 localhost kernel: kauditd_printk_skb: 31 callbacks suppressed
Sep 19 16:02:27 localhost kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
Sep 19 16:02:27 localhost systemd-sysctl[959]: Couldn't write '1' to 'net/bridge/bridge-nf-call-ip6tables', ignoring: No such file or directory
Sep 19 16:02:27 localhost systemd-sysctl[959]: Couldn't write '1' to 'net/bridge/bridge-nf-call-iptables', ignoring: No such file or directory
Sep 19 16:02:28 localhost kernel: i8042: Warning: Keylock active
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
Sep 19 16:02:32 localhost systemd-networkd[1710]: ens5: Failed to configure DHCPv4 client: Permission denied
Sep 19 16:02:32 localhost systemd-networkd[1710]: ens5: Failed
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Start request repeated too quickly.
Sep 19 16:02:32 localhost systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
Sep 19 16:02:32 localhost systemd[1]: Failed to start dbus-broker.service - D-Bus System Message Bus.
Sep 19 16:02:32 localhost systemd[1]: dbus.socket: Failed with result 'service-start-limit-hit'.
Sep 19 16:02:32 localhost systemd[1]: acpid.service: Failed with result 'exit-code'.
Sep 19 16:04:01 localhost systemd[1]: systemd-logind.service: start operation timed out. Terminating.
Sep 19 16:04:02 localhost systemd[1]: systemd-logind.service: Failed with result 'timeout'.
Sep 19 16:04:02 localhost systemd[1]: Failed to start systemd-logind.service - User Login Management.

What you expected to happen:

I expected the EC2 to run fine with no issues with network, sshd, booting etc

How to reproduce it (as minimally and precisely as possible):

Use CIS hardened AMI (AL2023) as base AMI and build AMI using this project.

Environment:

  • AWS Region:
  • Instance Type(s):
  • Cluster Kubernetes version:
  • Node Kubernetes version:
  • AMI Version:

Do you mean you're using the AMI's provided by CIS in the AWS Marketplace? e.g.: https://aws.amazon.com/marketplace/pp/prodview-fqqp6ebucarnm

Yes, I am using the AMI provided by CIS as a base AMI to build a new AMI which is optimized for EKS

I see that in the cleanup.sh script under templates/shared/provisioners path you are cleaning up the necessary OS configuration files from AMI. This is what causing the issue in my case. I see the script runs for both AL2 and AL2023. In case of AL2 when we use the base CIS hardened AMI it doesn't cause any issues. However in case of AL2023 the generated AMI is having issues at boot time because of the cleanup script. Probably the files we remove in cleanup are never created at the boot time in case of AL2023 and that causes the network/permissions issues in the OS. Because of which necessary services like D-Bus, systemd networkd, userlogind.. etc are not running and instance reaches to a non bootable state.

you are cleaning up the necessary OS configuration files

This script removes logs and metadata specific to the instance the AMI was built with; it would not be causing the kind of issue you're describing. We change several things about the default network setup on AL2023 to make it compatible with EKS, there's likely a gap/conflict in the CIS AMI.

I am using the AMI provided by CIS as a base

You need to contact CIS for support (details on the Marketplace page).