[ZK] Triggering validation plan returns an error for zookeeper operator

Question

[ZK] Triggering validation plan returns an error for zookeeper operator

rishabh96b opened this issue 4 years ago · 0 comments

Description

The validation plan of zookeeper operator does not run properly and marked as COMPLETED. Please find the detailed logs below.

└── zookeeper-instance (Operator-Version: "zookeeper-3.4.14-0.3.1" Active-Plan: "validation")
    ├── Plan deploy (serial strategy) [NOT ACTIVE]
    │   ├── Phase zookeeper (parallel strategy) [NOT ACTIVE]
    │   │   └── Step deploy [NOT ACTIVE]
    │   └── Phase validation (serial strategy) [NOT ACTIVE]
    │       ├── Step validation [NOT ACTIVE]
    │       └── Step cleanup [NOT ACTIVE]
    ├── Plan not-allowed (serial strategy) [NOT ACTIVE]
    │   └── Phase not-allowed (serial strategy) [NOT ACTIVE]
    │       └── Step not-allowed [NOT ACTIVE]
    └── Plan validation (serial strategy) [COMPLETE], last updated 2021-01-04 20:10:40
        └── Phase connection (serial strategy) [COMPLETE]
            ├── Step connection [COMPLETE]
            └── Step cleanup [COMPLETE]

Command

kubectl kudo plan trigger --name=validation --instance=zookeeper-instance

The kudo-controller logs are flooded with

2021/01/04 14:20:10 HealthUtil: unknown type *v1beta1.PodDisruptionBudget is marked healthy by default
2021/01/04 14:20:10 HealthUtil: statefulset "zookeeper-instance-zookeeper" is not healthy: Waiting for 1 pods to be ready...
2021/01/04 14:20:10 TaskExecution: object default/zookeeper-instance-zookeeper is NOT healthy: statefulset "zookeeper-instance-zookeeper" is not healthy: Waiting for 1 pods to be ready...
2021/01/04 14:20:10 PlanExecution: 'deploy' step(s) (instance: default/zookeeper-instance) of the deploy.zookeeper are not ready
2021/01/04 14:20:10 InstanceController: Received Reconcile request for instance default/zookeeper-instance

The plan is supposed to trigger a job which in turn will print the zookeeper URI. But it is unable to create any job stating

 HealthUtil: job "zookeeper-instance-validation" still running or failed
2021/01/04 14:20:28 TaskExecution: object default/zookeeper-instance-validation is NOT healthy: job "zookeeper-instance-validation" still running or failed
2021/01/04 14:20:28 PlanExecution: 'validation' task(s) (instance: default/zookeeper-instance) of the deploy.validation.validation are not ready
2021/01/04 14:20:28 PlanExecution: 'validation,cleanup' step(s) (instance: default/zookeeper-instance) of the deploy.validation are not ready

The zookeeper-instance StatefulSet looks to be okay.

""2021-01-04 14:24:16,272 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@222] - Accepted socket connection from /127.0.0.1:39720
""2021-01-04 14:24:16,272 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@908] - Processing ruok command from /127.0.0.1:39720
""2021-01-04 14:24:16,273 [myid:3] - INFO  [Thread-290:NIOServerCnxn@1056] - Closed socket connection for client /127.0.0.1:39720 (no session established for client)

Lastly, I am getting a TLS handshake error as well

2021/01/04 14:20:31 InstanceController: Error when updating instance status. Operation cannot be fulfilled on instances.kudo.dev "zookeeper-instance": the object has been modified; please apply your changes to the latest version and try again
2021/01/04 14:20:32 InstanceController: Received Reconcile request for instance default/zookeeper-instance
2021/01/04 14:20:32 Computing health out of 0 Deployments, 0 ReplicaSets, 1 StatefulSets, 0 DaemonSets, 3 Pods
2021/01/04 14:20:32 Updating instance default/zookeeper-instance readiness to: true
2021/01/04 14:20:32 InstanceController: Readiness did not change for default/zookeeper-instance. Not updating.
2021/01/04 14:20:32 http: TLS handshake error from 10.0.130.81:56732: EOF
2021/01/04 14:20:42 http: TLS handshake error from 10.0.130.81:56844: EOF
...

KUDO Version

KUDO Version: version.Info{GitVersion:"0.17.2", GitCommit:"d902714c", BuildDate:"2020-11-16T20:34:11Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64", KubernetesClientVersion:"v0.19.2"}

I tried this with KUDO version 0.17.0 and was getting the same error.