kubernetes/autoscaler

Split Cluster Autoscaler codebase

Opened this issue ยท 12 comments

x13n commented

Which component are you using?:

Cluster Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Cluster Autoscaler has two parts: shared ("core") functionality which is generic to any cloud provider and the cloud provider specific one. CA releases contain both the shared part and all (~30) the cloud provider specific code, along with its dependencies. This approach leads to several problems:

  • With large number of supported cloud providers comes even larger number of imported libraries. This increases a risk of CA release having bad dependencies: either malicious or simply bugged, which can lead to security (CA permissions in the cluster have to be quite wide) and performance (additional logic burning cpu cycles) issues.
  • Maintainability is a problem. Many cloudprovider OWNERS are not k8s org members, so they cannot approve cloudprovider specific changes. At the same time, people with affinity to other cloud providers may not care enough or not be familiar enough to review or make changes in a specific cloud provider code.
  • Release qualification doesn't exist. With so many cloud providers it is hard to provide sufficient coverage and we provide none (beyond unit tests). #5378 is a recent example of that: 1.26.0 image was released even though it panics in runtime.

Describe the solution you'd like.:

I believe cloud provider specific code should live in separate repositories. OSS Cluster Autoscaler should really be a library that is being used in various ways, rather than a component trying to support all possible cloud providers. There may be an implementation or two that make sense in this repo (grpc & cluster API come to mind), but everything else probably belongs elsewhere.

Describe any alternative solutions you've considered.:

  • Introduce more policies around cloud providers (i.e. iterate on https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/POLICY.md) . This might solve maintainability issues, but release qualification, security & performance risks would be hard to mitigate this way.
  • Release separate CA image for each cloud provider. This was previously discussed on SIG meeting and would address security/performance concerns with multiple dependencies. It might potentially also help with release qualification. Maintainability would still be a concern.
  • Combine the two above. This presents a chance to address all issues, but it would require building a lot of infrastructure which - in my opinion - shouldn't really be a part of this repository.

Additional context.:

elmiko commented

we talked about this at the SIG meeting today and i think there was a good discussion. in general, i think it's a cool idea but i also share some of the hesitancy that was discussed in the meeting. namely that this is a really big change which could have near term consequences in the form of bugs and other problems as we figure out how to do the separation, but then also longer term effects around maintenance and involvement.

with that said though, i wanted to share some thoughts.

in some respects this conversion could look very similar to what happened with the cloud controller managers and their migration from the kubernetes main repository to external repositories. if we were to approach this topic from that perspective, then i think we would want to start by making a KEP about how the deprecation will happen and on what sort of time frame. after defining some of the time frame, i think we would also want to follow SIG cloud provider's example and demonstrate how the gRPC provider could be used as a mechanism for migrating all the "in-tree" providers to external repositories.

imo, using the gRPC provider as a gateway to the cluster autoscaler seems like an easier path than changing the autoscaler into a more library-ish code package which is then imported from provider specific implementations.

this doesn't completely answer all the questions, especially with regards to involvement and maintenance, but we could demonstrate a very clean workflow for providers to migrate their code into external gRPC providers. to go a step further, we could then provide more in-depth helm charts and guidance on how best to deploy the autoscaler with a gRPC client next to it (although this exists to some level already). we might even see the growth of an operator for deploying the cluster autoscaler based on the need for variance of the underlying cloud provider.

Release qualification doesn't exist. With so many cloud providers it is hard to provide sufficient coverage and we provide none (beyond unit tests). Cluster Autoscaler: Segmentation violation caused by HintingSimulator predicateChecker nil pointer #5378 is a recent example of that: 1.26.0 image was released even though it panics in runtime.

to this point, i have been working with some of the Cluster API community members to improve our Kubemark provider, we even have a work in progress PR that will run some autoscaler testing. i realize that using kubemark does not fully test all the provider specific bits, but it would allow us to create a lower cost method for running pre-submit end-to-end tests which would catch things like the bug referenced in the quote above.

I support a stronger separation between things like scheduling simulations and the cloud providers. But separate repositories sounds overkill to me. That would certainly increase the release burden significantly, and one typically is only interested in the K8s updates. But a mono-repos approach with each cloud provider as a go module sounds more doable. And perhaps an image per cloud provider.

In gardener we have to maintain a fork of this repo and add Machine Controller Manager as one of the cloud-providers. It is a continuous cost of synching with upstream only because all cloud-providers are included (in-tree) as part of this repo and I completely support an out-of-tree approach.

There is also quite an overlap between what cluster autoscaler does and what any cloud provider would like to do w.r.t management of machines and machine groups. This often breaks the single responsibility principle causing race conditions. Therefore i also fully support the need to use CA as a standalone library with well defined APIs easing its consumption from any provider. This will also ensure that there is just one actor which is responsible for managing machines and machine groups bringing in determinism w.r.t expected behavior.

elmiko commented

In gardener we have to maintain a fork of this repo and add Machine Controller Manager as one of the cloud-providers. It is a continuous cost of synching with upstream only because all cloud-providers are included (in-tree) as part of this repo and I completely support an out-of-tree approach.

just out of curiosity, would it be possible to use the gRPC provider to keep your updates so that you don't need to fork the autoscaler?

Yes, we would like to avoid maintaining a fork. Is the gRPC plugin feature stable ? Apologies for my ignorance, but it it still marked as a proposal. Is there a tracking issue for this that has been closed ?

x13n commented

While I was the one to initially create this issue, I agree with the concern raised on SIG meeting that it would cause the support for not-very-well maintained cloud providers code to die out and CA ceasing to work in these environments as a result. With that in mind, I think the "alternative considered" I originally posted is actually better, with some slight modifications. Specifically:

  1. I think all the cloud providers that are supported today should stay in this repository, but be better separated. Each cloud provider would have their own main.go entrypoint (doing #5432 is probably a prerequisite here). That would enable building separate images.
  2. Release process to build all these images would rely on e2e tests - they would have to be specified and maintained by each cloud provider maintainers.
  3. gRPC would be the recommended way of extending CA support for other cloud providers (deployed e.g. as cluster-autoscaler-grpc container along with a cloud provider sidecar).
  4. New cloud providers that cannot use gRPC for any reason, would be able to import CA code as a library, which should be much easier after (1) is done. Perhaps most importantly, it shouldn't require maintaining a fork.
  5. Adding new cloud provider in-tree shouldn't generally happen. It would require a special exception after agreement within the SIG.
  6. Once in-tree cloud provider is deprecated & removed based on the existing policy, it would be treated as completely new one in case someone wants to add it back.

Yes, we would like to avoid maintaining a fork. Is the gRPC plugin feature stable ? Apologies for my ignorance, but it it still marked as a proposal. Is there a tracking issue for this that has been closed ?

The proposals dir contains historical proposals as well. To my knowledge, gRPC support was added and is considered done. I don't think we had a gRPC issue specifically, the closest one would be probably #3644.

New cloud providers that cannot use gRPC for any reason, would be able to import CA code as a library, which should be much easier after (1) is done. Perhaps most importantly, it shouldn't require maintaining a fork.

This would be the most preferred option for us (if and when it comes). Main reason - CA does not know and understand beyond node groups. In gardener machine-controller-manager has its own set of custom resources MachineDeployment, MachineSet and Machine (inspired by Deployment -> ReplicaSet -> Pod) to manage machine clusters. There is quite an overlap between the controllers that are created by machine-controller-manager and cluster-autoscaler. Therefore the best way forward is for it to be used as a library. Cloud providers can then choose what they wish to leverage from CA and where it wishes to take over. This enforces single-actor-principle and makes things quite deterministic.

In the mean time we can evaluate grpc-external-provider approach and see if this would be consumable by us.

I will preface this by saying that since December 2022 I am no longer an AWS employee and don't have a dog in this fight.

There is also quite an overlap between what cluster autoscaler does and what any cloud provider would like to do w.r.t management of machines and machine groups. This often breaks the single responsibility principle causing race conditions.

I could not agree more. This is the fundamental problem with CA's current lack of extensibility and the reason why things like Karpenter were created.

Therefore i also fully support the need to use CA as a standalone library with well defined APIs easing its consumption from any provider. This will also ensure that there is just one actor which is responsible for managing machines and machine groups bringing in determinism w.r.t expected behavior..

This is, in fact, how the karpenter-core interfaces are designed. Specifically, the CloudProvider interface demarcates the CloudProvider as being responsible for creating, deleting, and fetching Machines (Nodes).

The whole concept of a "group of nodes" is deliberately absent in this layer in Karpenter, as it should be so that different cloud provider compute infrastructure APIs can be used to manage individual instances instead of relying on the concept of a Managed Node Group / AutoScalingGroup API being the only mechanism by which to provision/deprovision nodes. In the case of AWS, Karpenter can make use of the EC2 CreateFleet API instead of the EC2 AutoScalingGroup APIs in order to make better, more fine-grained node provisioning decisions.

Sidenote: I think the design of cluster-autoscaler might be the reason why I couldn't write a Karpenter driver for GKE.
When I went to create a Karpenter provider for GKE, I discovered that GKE has no ability to manually join a GCE worker node to a GKE cluster. As a GKE user, you are forced to use the GKE node pools which are managed by GKE's auto-provisioner. If I had the ability to join a manually provisioned GCE instance, I could have written a Karpenter driver for GKE that would use singular calls to create instances or used instanceTemplates manually. If GKE is using cluster-autoscaler for both its autoscaling and node management, this explains why there is no support for joining individual nodes to a GKE cluster: cluster-autoscaler simply requires the concept of a NodeGroup (NodePool in GKE) and its interfaces are married to the concept of a group of nodes having the same instance size...

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale
/lifecycle frozen