mariadb-operator/mariadb-operator

[Feature] Support joining existing galera cluster (For eg: across 2 different k8s clusters)

Opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
After deploying the operator in a k8s cluster, we can create a new galera cluster.
But in case we have many k8s clustesr that need to share the same galera cluster it's not possible.

Describe the solution you'd like
The ability to create a MariaDB CR with wsrep_cluster_address of the db nodes in another kubernetes cluster ( which exposed via loadbalancer)

Hey there @starizard! Thanks for bringing this up

We have thought about this already:

Although possible, there are a couple of things that need to be covered for this:

  • Per node Service, allowing to connect to each of the nodes individually from the outside of the cluster (LoadBalancer)
  • We will need to extend spec.galera to specify:
    • Extra peers FDQN to connect to. As you said, this peers will be included in wsrep_cluster_address
    • How to authenticate connections with them
    • How to trust TLS connections with them

This new topology has implications we haven't faced before, so we will need to further investigate these points:

  • An operator is only able to manage Pods within the cluster it runs, how do we manage the Pods in an external cluster? Another operator running in the external cluster? If that was the case, we need to take into account the following considerations:
    • There should be only one Galera cluster running across 2 Kubernetes clusters. Creating 2 different Galera clusters will mean that we have a split brain
    • Cluster recovery process gets tricky, we cannot control external Pods and therefore we can't get the sequence number on them to know which is the most advanced node and therefore where to bootstrap the new cluster

To overcome the previous points, I can think of:

  • Not allowing writes on the external cluster, this way, we know for sure that the most advanced node will be in the current cluster and we can perform the cluster recovery. However, we still need to restart the Pods in the external cluster when that happens, this cross-cluster coordination will be tricky if not imposible.
  • Have 2 different Galera clusters in 2 different Kubernetes clusters and setup replication between them. Writes can only happen on one of the cluster and the initial replication setup can be done by one of the operators, since it is done via cross-cluster SQL statements

Taking into account all my previous points, the latter option seems to be the most reasonable one TBH, and it seems to be a common pattern we can automate:

Sorry if this was too long, happy to hear your thoughts!

I would love this to be possible to include servers outside of kubernetes aswell