Explore concept of VNA - Vertical Node Autoscaler
eytan-avisror opened this issue · 3 comments
In some cases, it may be appropriate to scale nodes vertically, i.e. from m5.xlarge to m5.2xlarge.
For example, when we detect better binpacking may occur, or when the IG reaches the max and there are pending pods.
e.g.
We can try to abstract instance type completely, example:
apiVersion: instancemgr.keikoproj.io/v1alpha1
kind: InstanceGroup
metadata:
name: my-instance-group
namespace: instance-manager
spec:
provisioner: eks
strategy:
type: rollingUpdate
rollingUpdate:
maxUnavailable: 1
eks:
minSize: 3
maxSize: 6
configuration:
# < instanceType not provided >
instanceFamily: m5 # optional
resources:
requests:
mem: 8Gi
cpu: 2
limits:
mem: 64Gi
cpu: 16
...
Initially spin up m5.xlarge (if instanceFamily is provided, otherwise we can decide the best match) which provides 2vcpu/8Gi mem, and we can scale up to m5.4xlarge which has 16/64 respectively.
Another option is to keep this new spec inside VerticalScalingPolicy
so that the IG simply does not provide instanceType and VSP can be provided as follows:
apiVersion: instancemgr.keikoproj.io/v1alpha1
kind: VerticalScalingPolicy
metadata:
name: default
namespace: instance-manager
spec:
instanceFamily: m5 # optional
resources:
requests:
mem: 8Gi
cpu: 2
limits:
mem: 64Gi
cpu: 16
scaleTargetRef:
apiVersion: instancemgr.keikoproj.io/v1alpha1
kind: InstanceGroup
name: my-instance-group
We should also probably explore supporting something like HPA's behavior
spec based on node capacity
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 100 // should be between 0 and 40
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
@backjo any thoughts on this, would you find this useful?
I could see it being useful - though we just use multiple IGs right now with scale from zero
enabled and it solves it for us. CA does a decent job of scaling between them. It is a bit tedious though.
@backjo interesting, so you keep multiple IG on min 0, and in case you need to scale up beyond max of ASG-1 - ASG-2..N. would scale up additional nodes for you? How does CA know which ASG to scale?
In this case would it make more sense to scale vertically with a single IG instead and keep the same range of nodes? e.g. min 3 / max 10
More like - we have multiple IGs with different compute / memory requirements. CA is configured to least-waste