aws-samples/amazon-eks-custom-amis

Running out of inodes on hardened AMIs

Opened this issue ยท 0 comments

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Configuration

  • Packer Version: 1.9.4

  • Packer Configuration:

instance_type   = "m5.2xlarge"
ami_description = "Amazon EKS Kubernetes AMI based on AmazonLinux2 OS"

ami_block_device_mappings = [
  {
    device_name = "/dev/xvda"
    volume_size = 10
  },
]

launch_block_device_mappings = [
  {
    device_name = "/dev/xvda"
    volume_size = 10
  },
  {
    device_name = "/dev/xvdb"
    volume_size = 100
  },
]

shell_provisioner1 = {
  expect_disconnect = true
  scripts = [
    "scripts/update.sh"
  ]
}

shell_provisioner2 = {
  expect_disconnect = true
  // Pass in values below if enabling proxy support
 environment_vars = [
   //     "HTTP_PROXY=xxx",
  //     "HTTPS_PROXY=xxx",    
 //      "NO_PROXY=xxx",
   ]
  scripts = [
    "scripts/partition-disks.sh",
    "scripts/configure-proxy.sh",
    "scripts/configure-containers.sh",
  ]
}

shell_provisioner3 = {
  expect_disconnect = true
  scripts = [
    "scripts/cis-benchmark.sh",
    "scripts/cis-eks.sh",
    "scripts/cleanup.sh",
    "scripts/cis-benchmark-tmpmount.sh",
  ]
}

###################
# Custom Variables
###################
tags = {
    allowed_environment_mgmt=true, 
    allowed_environment_dev=true, 
    allowed_environment_qa=false, 
    allowed_environment_prod=false, 
    allowed_environment_hc=false
    }
vpc_id = "vpc-012345678901234"
subnet_id = "subnet-0eeeeeeeeeeeeeeee4"
kms_key_id = "alias/amis"
encrypt_boot = true
ami_users = [
 "000000000000",
 "1111111111111111", 
 "2222222222222", 
 "333333333333"
 ]
ami_name_prefix = "eks-cis-benchmark"
ssh_timeout= "10m"
ssh_interface = "private_ip"
associate_public_ip_address = false

Expected Behavior

Over time, the data stored in /var/lib/containerd may grow in size, but the inodes should not outnumber the data being written there.

Actual Behavior

Our EKS worker nodes were unable to deploy new images. Upon investigation, we found that the /var/lib/containerd partition had 22% remaining when doing df -h, but inodes were at 100% used when performing df -i.

Steps to Reproduce

I'm not sure how to reproduce this. We have been running these nodes for about three weeks now, and just today we noticed this issue. To resolve this, I bumped the size of the EBS volume to 100GB in the .pkvars file, and adjusted the partitioning script to allocate more space to /var/lib/containerd after running into disk space issues with the standard build. Here is the relevant portion of my partition-disks.sh file:

parted -a optimal -s $disk_name \
    mklabel gpt \
    mkpart var ext4 0% 13% \
    mkpart varlog ext4 13% 26% \
    mkpart varlogaudit ext4 26% 39% \
    mkpart home ext4 39% 45% \
    mkpart varlibcontainer ext4 45% 100%

Important Factoids

I can't think of any important factoids, other than what I stated above. The things that are 'custom' about my build is that the disk has been bumped to 100gb from 64, and the partition script was changed to give more space to /var/lib/containerd. Other than that my AMI should look very similar to others.

References

  • #0000