projectcalico/calico

Maintaining a desired number of route reflector nodes automatically

stoyanr opened this issue ยท 4 comments

Expected Behavior

The documentation in https://docs.projectcalico.org/v3.4/usage/configuration/bgp describes how to setup individual cluster nodes as route reflectors. We would very much like to use route reflectors in our clusters, but we have the following issue: when nodes come and go, it would be difficult for us to maintain a desired number of route reflectors manually. We would expect that there is a straightforward way to automate this process.

Current Behavior

It is not possible to automatically configure a node as a route reflector if another route reflector node is removed from the cluster, or in general to maintain a desired number of route reflectors without manual effort.

Possible Solution

On possible solution would be to introduce a new Kubernetes controller that monitors resources of type RouteReflectorNodeSet and maintaining a desired number of route reflector nodes in the cluster, similarly to the way a RelicaSet is used to maintain a desired number of pods.

Context

We would very much like to use route reflector nodes in our clusters, but we need all lifecycle management operations to be fully automated, which means that configuring route reflectors manually is not an option. Without automation, we would have to stick to the full node mesh.

Your Environment

We are using Kubernetes 1.12 (soon 1.13). We plan to use Gardener to manage the lifecycle of a large number of clusters for a private cloud. This means that the cluster configuration must be fully automated.

We plan to use Machine Controller Manager to manage machines. This means that if a node goes down and is replaced by another node, without an automated solution nobody would even notice that something else has to change (e.g. the new node should become a route reflector).

We would also like this. We have a Kubernetes cluster with >200 nodes in which we just configured some in-cluster route reflectors, as we felt the full BGP mesh was in danger of tipping over.

Another possible and maybe lighter-weight solution would be to specify a Kubernetes label selector in the calico configuration for nodes that should be configured as route reflectors. Then whatever orchestration system the user is using would be responsible for ensuring that the desired number of nodes with that label exist, and a Calico controller would be responsible for configuring them as route reflectors in the calico datastore. For us, that's a Kops InstanceGroup, which is an AWS autoscaling group under the hood.

juris commented

Any progress with this one? Just wondering how other people deal with large calico clusters without automated route reflector provisioning?

mhmxs commented

I started to work on RR autoscaling and wrote down a proposal of the concept in my mind :). https://github.com/mhmxs/calico-route-reflector-operator-proposal. Please feel free to join to the mission and share your ideas, concerns or advices.

For what we want to do in core Calico, see #5311

I think a feature like this is best placed as a third-party addon that uses the Calico API for now.