openebs-archive/jiva-operator

Need anti-affinity policies for replica pods.

Closed this issue · 1 comments

kmova commented

Describe the problem/challenge you have

Distributed applications like mongodb require the volumes to be spread across multiple nodes - just like its own replicas. Cross scheduling them will cause performance and high availability issues.

Consider this case of 3 replica mongo sts. The mongo pods are neatly distributed across three different nodes:

kiran_mova_mayadata_io@kmova-dev:mongodb$ kubectl get pods -o wide | grep mongo
mongo-0                  2/2     Running   0          56m   10.0.2.15     gke-kmova-helm-default-pool-30f2c6c6-1942   <none>           <none>
mongo-1                  2/2     Running   0          55m   10.0.0.21     gke-kmova-helm-default-pool-30f2c6c6-3jsv   <none>           <none>
mongo-2                  2/2     Running   0          54m   10.0.1.12     gke-kmova-helm-default-pool-30f2c6c6-qf2w   <none>           <none>

However, the target pods are packed into single node:

kiran_mova_mayadata_io@kmova-dev:mongodb$ kubectl get pods -o wide -n openebs | grep jiva-ctrl
pvc-1b21ac95-fd9f-466f-a39b-c1e1ab6e6cb5-jiva-ctrl-75d9f46fvxng   1/1     Running   0          58m   10.0.0.22     gke-kmova-helm-default-pool-30f2c6c6-3jsv   <none>           <none>
pvc-96120cb1-0f36-4a53-9263-6af8b8cc5a66-jiva-ctrl-6c5db7d7hq6n   1/1     Running   0          59m   10.0.0.17     gke-kmova-helm-default-pool-30f2c6c6-3jsv   <none>           <none>
pvc-faa218d5-46c6-4bb7-a598-024970cf9b4c-jiva-ctrl-548585cnz9js   1/1     Running   0          59m   10.0.0.20     gke-kmova-helm-default-pool-30f2c6c6-3jsv   <none>           <none>
  • A failure to 3jsv will cause all mongo pods to go down.
  • The mongo pods on nodes other than 3jsv will have to go over the network to access their data.

A similar issue exists (but slightly more severe) with the jiva replica pods getting scheduled to same node:

pvc-1b21ac95-fd9f-466f-a39b-c1e1ab6e6cb5-jiva-rep-0               1/1     Running   0          54m   10.0.0.24     gke-kmova-helm-default-pool-30f2c6c6-3jsv   <none>           <none>
pvc-96120cb1-0f36-4a53-9263-6af8b8cc5a66-jiva-rep-0               1/1     Running   0          55m   10.0.0.19     gke-kmova-helm-default-pool-30f2c6c6-3jsv   <none>           <none>
pvc-faa218d5-46c6-4bb7-a598-024970cf9b4c-jiva-rep-0               1/1     Running   0          55m   10.0.2.17     gke-kmova-helm-default-pool-30f2c6c6-1942   <none>           <none>
  • Two of the replicas are on 3jsv - which means data for two of the mongo pods is on only 3jsv. Failure of 3jsv will cause mongo db to be lost.

Describe the solution you'd like
Jiva Volume Policies should allow specifying an anti-affinity feature that allows replica pods of a given application to be not co-located onto same node.

Anything else you would like to add:
This feature was supported with external storage Jiva Volumes - using ReplicaAntiAffinityTopoKey and specifying a unique label to all the PVCs belonging to the same application. openebs.io/replica-anti-affinity.

Workaround
When using single replica volumes - use local storage directly.

While migrating from older external provisioned volumes to csi volumes we will need to assign the sts pods one the same nodes as the old volume. Adding the ability to add node affinity rules for replica in the policy will help with the migration.