keikoproj/instance-manager

Support warm pools

eytan-avisror opened this issue · 18 comments

ASGs support provisioning pre warmed instances to make scaling faster.
This could be pretty useful if it cuts down the autoscaling time.

We should explore both running/stopped option for pre-warmed instances and support this API if it makes sense

https://docs.aws.amazon.com/sdk-for-go/api/service/autoscaling/#AutoScaling.PutWarmPool

  • AL2 Support
  • Windows Support
  • Bottlerocket Support

It'd be interesting to compare this to the existing startup time - it's around 90-120 seconds for us on our BottleRocket nodes.

Though, it looks like it might not work right now for us:
Warm pools currently can't be used with ECS, EKS, and self-managed Kubernetes.

I just tested it in stopped mode, and it works! got nodes to join within few seconds of scale event.. not sure why they say it's not supported 🤣 need to see exactly what is not supported

Pre warmed stopped instance - 1m 20s from pod scale event, to node ready
No pre-warming - 1m 50s - so it potentially shaves a cool 30s off of your scaling time
so around ~30% faster

Not much, but nice improvement if you have very peaky traffic

One corner case we need to work around somehow, is that when the warm pool instances are started they briefly join the cluster and then when they are stopped they leave.
The problem cases I've seen is when:

  • They go to ready for some time (this may cause scheduling)
  • The node objects sometime stick around as NotReady after the instance is stopped

Ideal solution would have a way to not run the bootstrap script on the prewarmed instances, and only run it when scaling happens.

I am experimenting with using some simple script addition in userdata to be able to do that e.g.

INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
LIFECYCLE=$(aws autoscaling describe-auto-scaling-instances --region us-west-2 --instance-id $INSTANCE_ID | jq ".AutoScalingInstances[].LifecycleState")

if grep -q "$LIFECYCLE" <<< "Warmed"; then
  exit 0
fi

## Bootstrap script below

But this would mean granting every node access to DescribeAutoscalingInstances which might be problematic

Reached out to AWS to ask why it's not supported, now the docs say:

If you try using warm pools with Amazon Elastic Container Service (Amazon ECS) or Elastic Kubernetes Service 
(Amazon EKS) managed node groups, there is a chance that these services will schedule jobs on an instance 
before it reaches the warm pool. 

So the issue we need to work around is the instances becoming ready for few seconds while they get warmed.

Ideal solution would be to somehow not bootstrap the nodes on warming, only on scaling.

Here is how we are able to work around current limitations:

Add the following conditional template to userdata:

## Only if spec has warm pool configured this section will be added - this means userdata change when enabling warmPool, we should reevaluate whether we must have this templated or can keep it static.
{{- if .HasWarmPool}}

## This requires the AMI having awscli + jq, or install it in a prebootstrap step. if it's not installed this section will be skipped and we always bootstrap the node.
if [[ $(type -P $(which aws)) ]] && [[ $(type -P $(which jq)) ]] ; then
   
        ## Get InstanceID, Region, and Lifecycle Stage - getting lifecycle requires adding extra permissions on the node, would be nice if we can find a way to figure out lifecycle without adding access to DescribeAutoScalingInstances

	INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
	REGION=$(curl http://169.254.169.254/latest/meta-data/placement/region)
	LIFECYCLE=$(aws autoscaling describe-auto-scaling-instances --region $REGION --instance-id $INSTANCE_ID | jq ".AutoScalingInstances[].LifecycleState")
	echo $INSTANCE_ID, $REGION, in state: $LIFECYCLE

       ## If the lifecycle stage of the node is *Warmed*, we delete the userdata state file and exit without bootstrapping
	if [[ $LIFECYCLE == *"Warmed"* ]]; then
                ## By removing this file we guarantee the next bootup (when scaling happens), and the lifecycle is no longer *Warmed*, we will run userdata once again and this time, bootstrap
		rm /var/lib/cloud/instances/$INSTANCE_ID/sem/config_scripts_user
		exit 0
	fi
fi
{{- end}}

< .. call to bootstrap.sh here .. >

This workaround works perfectly, the design cost however is:

  • Extra node rotation when enabling feature (need to evaluate making block static)
  • Adding access to every node for DescribeAutoScalingInstances, we can make the controller add this in case the IAM role is managed by controller and WarmPool is configured, but it's still extra permission to give every node.
  • Not sure if EKS AMIs usually come with awscli & jq preinstalled, but asking users to add these packages to their AMI is a bit painful for this feature.

TBD:

  • Need to work on upgrade scenario, in this case we can detect if the warm pool instances' launch config/template mismatches from the scaling config, and delete the warm pool if that is the case before we proceed to rotate the nodes, the next reconciles will recreate the node pool with the new launch config/template.
    ** If this is not handled, the nodes terminated in the scaling group, cause instances from the warm pool to join instead which still have the old launch configuration - this actually makes the upgrade process longer.

Ideal state

We need to work with AWS to ask them to make the lifecycle state more easily accessible to UserData, most design flaws with above is just for getting the lifecycle state of the node from UserData.
If they can inject it in cloud-init somehow that would solve most issues.

@backjo funny I thought we were on the bleeding edge of trying this out 🤣

They use the same approach basically of looking at LifecycleState and checking if it has "Warmed" as prefix.
The difference is that Kops uses nodeup as a systemd to handle bootstrapping, and on eks we have the bootstrapping.sh script which is called in userdata.

I thought so too!

Yeah, pretty much - and since we have 3 different bootstrap methods (AL2/BottleRocket/Windows) - there will be 3 different places where startup changes need to happen.

I thought so too!

Yeah, pretty much - and since we have 3 different bootstrap methods (AL2/BottleRocket/Windows) - there will be 3 different places where startup changes need to happen.

Yeah, we can totally do that, I guess the most annoying part is adding DescribeAutoScalingInstances to all nodes & depending on awscli / jq (we will have to possibly do something else on Windows machines)

I wonder if another approach is to vend a small binary similar to Kops, however we would need to come up with a mechanism to get that binary to the node (they use S3 buckets I think).

@backjo is this even possible with bottlerocket TOML style user-data? Can we run an arbitrary script on bottlerocket?

Not out of the box - there's a few levers we can pull, namely bootstrap containers.

PR out for AL2 support, we should also add Windows support

@backjo do you want to leave this open to explore bottlerocket support, or do you feel like this makes less sense? if it doesnt make much sense for bottlerocket we can close this issue

I think it's still worth determining if it impacts startup time at all, but not a priority for me at the moment since it's already about a minute for the node to spin up.

OK, will close this for now since it's not a priority / there is not much to gain from it.
If anyone is interested in this feature for bottlerocket we can open a new issue