amaizfinance/redis-operator

Clean install does not work

Sieabah opened this issue · 8 comments

Applying the crds and required operator resources in it's own namespace does create the operator and a cluster. However nothing can connect to that cluster reliably because "master" service points to a redis slave instance.

{"level":"info","ts":"2020-05-11T01:40:25.649Z","logger":"controller_redis","msg":"Applied *v1.Service","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.663Z","logger":"controller_redis","msg":"Applied *v1.Service","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.683Z","logger":"controller_redis","msg":"Applied *v1.Service","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.709Z","logger":"controller_redis","msg":"Applied *v1.Service","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.739Z","logger":"controller_redis","msg":"Applied *v1.Service","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.747Z","logger":"controller_redis","msg":"Applied *v1.Secret","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.754Z","logger":"controller_redis","msg":"Applied *v1.ConfigMap","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:25.766Z","logger":"controller_redis","msg":"Applied *v1beta1.PodDisruptionBudget","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:26.005Z","logger":"controller_redis","msg":"Applied *v1.StatefulSet","Namespace":"default","Redis":"shared-redis"}
{"level":"info","ts":"2020-05-11T01:40:26.304Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 0 are healthy"}
{"level":"info","ts":"2020-05-11T01:40:26.542Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 0 are healthy"}
{"level":"info","ts":"2020-05-11T01:40:29.198Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 0 are healthy"}
{"level":"info","ts":"2020-05-11T01:40:39.630Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 0 are healthy"}
{"level":"info","ts":"2020-05-11T01:41:00.368Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 0 are healthy"}
{"level":"info","ts":"2020-05-11T01:41:14.977Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 1 are healthy"}
{"level":"info","ts":"2020-05-11T01:41:15.190Z","logger":"controller_redis","msg":"Error creating Redis replication, requeue","Namespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 1 are healthy"}
{"level":"info","ts":"2020-05-11T01:41:41.590Z","logger":"controller_redis","msgamespace":"default","Redis":"shared-redis","error":"minimum replication size is not met, only 1 are healthy"}
{"level":"info","ts":"2020-05-11T01:41:59.595Z","logger":"controller_redis","msg":"no master discovered, requeue","Namespace":"default","Redis":"shared-redis","error":"no master discovered","replication":[{"Host":"10.60.1.15","Port":"6379"},{"Host":"10.60.0.24","Port":"6379"}]}
{"level":"info","ts":"2020-05-11T01:42:05.958Z","logger":"controller_redis","msg":"no master discovered, requeue","Namespace":"default","Redis":"shared-redis","error":"no master discovered","replication":[{"Host":"10.60.1.15","Port":"6379"},{"Host":"10.60.0.24","Port":"6379"}]}
{"level":"info","ts":"2020-05-11T01:42:35.246Z","logger":"controller_redis","msg":"no master discovered, requeue","Namespace":"default","Redis":"shared-redis","error":"no master discovered","replication":[{"Host":"10.60.1.15","Port":"6379"},{"Host":"10.60.0.24","Port":"6379"},{"Host":"10.60.2.20","Port":"6379"}]}
{"level":"info","ts":"2020-05-11T01:42:41.245Z","logger":"controller_redis","msg":"no master discovered, requeue","Namespace":"default","Redis":"shared-redis","error":"no master discovered","replication":[{"Host":"10.60.1.15","Port":"6379"},{"Host":"10.60.0.24","Port":"6379"},{"Host":"10.60.2.20","Port":"6379"}]}
{"level":"info","ts":"2020-05-11T01:47:19.691Z","logger":"controller_redis","msg":"no master discovered, requeue","Namespace":"default","Redis":"shared-redis","error":"no master discovered","replication":[{"Host":"10.60.0.24","Port":"6379"},{"Host":"10.60.2.20","Port":"6379"}]}

The operator seems to know there is no master, and there seems to be no election going on.

redis-shared-redis                      ClusterIP   10.64.6.241    <none>        6379/TCP                      22m
redis-shared-redis-headless             ClusterIP   None           <none>        6379/TCP                      22m
redis-shared-redis-master               ClusterIP   10.64.4.69     <none>        6379/TCP                      22m

The cluster ends up with an IP for master despite none of them being masters.

Hi!

Can you please provide more info about the environment?
Kubernetes version, redis version?

Hi, At the time it was the latest Google Kubernetes Engine which was 1.15, I've since moved to the spotahome operator which also failed in the same environment for different reasons. GKE has pushed out 1.16 which my cluster is now on as well.

I run a GKE cluster of 3 preemptible nodes (max 24h life, but die at different times) in a staging environment, wanted to see if this operator worked better. Turns out no operator for redis is able to failover fast enough and results in a cluster outage because the master can't be found. This specific operator though was never able to recover despite all nodes and pods being up.

So I suspect that this could happen to any cluster over time if you're unlucky enough. I've decided to move to elixir/phoenix and genserver to remove my redis dependency entirely to see if that better solves my problem.

@nrvnrvn Hi Nick, we have been trying for a while to pick an operator so we can avoid Redis Enterprise, we're going to try yours, but I'm concerned if the project is still supported? We only have 6 months K8S experience but decades of programming and infrastructure but not in GO, we're .NET. Is there any chance you can close open issues and give us a hand if we get stuck? If we can contribute we will :) Kind regards, Anthony

Hi @asumner,

thanks for the interest in this project.

It is still supported. I am now focusing on #12 Yet I'm having hard times to find time to do it. :)

I will try to address the open issues shortly especially #18 and #17.

@nrvnrvn I installed the operator fine and it did work out of the box :) I tested the recovery of a deleted Master and it seemed to work fine. I've moved onto enabling Istio and Envoy to play nice with the Redis cluster, mTLS is first issue but getting there. I am not clear how your operator interacts with sharding, or if thats just a learning curve on the Redis configurations?
Environment is Azure AKS v 1.17.7, a Redis 6.0.7-Alpine.

PS following the instructions the only thing I forgot to do was create the secret after that it started fine. (EnvoyProxy will mess with Gossip and masters so I'm trying to enable the EnvoyFilter for redis. )

This operator is a kubernetes-native replacement for sentinel. ))

I will be happy to add cluster support if there is enough interest in it.

@nrvnrvn Ah, is my shiny new cluster not got support for sharding? I concluded that the operator got rid of Sentinel (having kafka and zookeeper fun) I thought this operator's capabilities was using the later Redis versions that support master slave and sharding... Have I just installed a dead end? (we are using it today for kafka to db in redis and pub sub) but soon we need sharding for scale and resiliency? (I'm going to guess its the CRD that needs to implement more cluster aware properties?)

@asumner as of now it does not support redis cluster unfortunately.

But I will be happy to add support for it.

This operator's initial goal was to support 1 master <-> N replicas architecture in the same way Sentinel provides it but without the burden of maintaining Sentinel itself.

Redis Cluster is N masters * N replicas and requires a bit different approach to setting up and managing.

If you need support for sharding and all the rest Cluster features immediately this operator is not for you. :(

If soon is a matter of couple of months then I will be happy to help you with it :)