banzaicloud/koperator

Cruise Control is not ready (yet)

lavis11 opened this issue · 2 comments

Description

I have upgraded kafka-operator from v0.21.2 to v0.24.0 and kafka 2.8.0 to 3.4.1
Version of Cruise Control is 2.5.101
The Pods and services are up and running. But it says Cruise Control is not ready (yet).

Applied new CRD's
cruisecontroloperations.kafka.banzaicloud.io
kafkaclusters.kafka.banzaicloud.io
kafkatopics.kafka.banzaicloud.io
kafkausers.kafka.banzaicloud.io

Log line from Kafka Operator

{"level":"info","ts":"2024-02-09T11:05:13.892Z","msg":"got response for request","controller":"CruiseControlOperation","controllerGroup":"kafka.banzaicloud.io","controllerKind":"CruiseControlOperation","CruiseControlOperation":{"name":"kafka-rebalance-l8468","namespace":"kafka"},"namespace":"kafka","name":"kafka-rebalance-l8468","reconcileID":"715bb6cd-437d-4a6c-bcd7-66d8c1f794aa","url":"http://vxkafka-cruisecontrol-svc.kafka.svc.cluster.local:8090/kafkacruisecontrol/state?json=true&substates=ANALYZER%2CANOMALY_DETECTOR%2CEXECUTOR%2CMONITOR&verbose=true","status":200}

{"level":"info","ts":"2024-02-09T11:05:13.892Z","msg":"requeue event as Cruise Control is not ready (yet)","controller":"CruiseControlOperation","controllerGroup":"kafka.banzaicloud.io","controllerKind":"CruiseControlOperation","CruiseControlOperation":{"name":"kafka-rebalance-l8468","namespace":"kafka"},"namespace":"kafka","name":"kafka-rebalance-l8468","reconcileID":"715bb6cd-437d-4a6c-bcd7-66d8c1f794aa","status":{"MonitorReady":true,"ExecutorReady":true,"AnalyzerReady":false,"ProposalReady":false,"GoalsReady":false,"MonitoredWindows":0,"MonitoringCoverage":0}}

Log line from Cruise Control ("status":"notReady")

{"AnalyzerState":{"isProposalReady":false,"readyGoals":[],"goalReadiness":[{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.0,"requiredNumSnapshots":1},"status":"notReady","name":"ReplicaCapacityGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"DiskCapacityGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"NetworkInboundCapacityGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"NetworkOutboundCapacityGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"CpuCapacityGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.0,"requiredNumSnapshots":1},"status":"notReady","name":"ReplicaDistributionGoal"},{"modelCompleteRequirement":{"includeAllTopics":false,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"PotentialNwOutGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"DiskUsageDistributionGoal"},{"modelCompleteRequirement":{"includeAllTopics":false,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"NetworkInboundUsageDistributionGoal"},{"modelCompleteRequirement":{"includeAllTopics":false,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"NetworkOutboundUsageDistributionGoal"},{"modelCompleteRequirement":{"includeAllTopics":false,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"CpuUsageDistributionGoal"},{"modelCompleteRequirement":{"includeAllTopics":true,"minMonitoredPartitionsPercentage":0.0,"requiredNumSnapshots":1},"status":"notReady","name":"TopicReplicaDistributionGoal"},{"modelCompleteRequirement":{"includeAllTopics":false,"minMonitoredPartitionsPercentage":0.995,"requiredNumSnapshots":1},"status":"notReady","name":"LeaderBytesInDistributionGoal"}]},"MonitorState":{"trainingPct":0.0,"trained":false,"numFlawedPartitions":0,"monitoredWindows":{},"state":"RUNNING","numTotalPartitions":0,"numMonitoredWindows":0,"monitoringCoveragePct":0.0,"reasonOfLatestPauseOrResume":"N/A","numValidPartitions":0},"ExecutorState":{"state":"NO_TASK_IN_PROGRESS"},"AnomalyDetectorState":{"recentBrokerFailures":[],"recentGoalViolations":[],"selfHealingDisabled":[],"balancednessScore":100.0,"selfHealingEnabled":["BROKER_FAILURE","DISK_FAILURE","GOAL_VIOLATION","METRIC_ANOMALY","TOPIC_ANOMALY","MAINTENANCE_EVENT"],"recentDiskFailures":[],"metrics":{"meanTimeToStartFixMs":0.0,"meanTimeBetweenAnomaliesMs":{"BROKER_FAILURE":0.0,"DISK_FAILURE":0.0,"GOAL_VIOLATION":0.0,"TOPIC_ANOMALY":0.0,"MAINTENANCE_EVENT":0.0,"METRIC_ANOMALY":0.0},"ongoingAnomalyDurationMs":0,"numSelfHealingStarted":0,"numSelfHealingFailedToStart":0},"recentMetricAnomalies":[],"recentTopicAnomalies":[],"selfHealingEnabledRatio":{"BROKER_FAILURE":1.0,"DISK_FAILURE":1.0,"GOAL_VIOLATION":1.0,"METRIC_ANOMALY":1.0,"TOPIC_ANOMALY":1.0,"MAINTENANCE_EVENT":1.0},"recentMaintenanceEvents":[]},"version":1}

Do i need to Configure anything regarding Cruise Control?

Hello @lavis11 !
Is your Kafka cluster is healthy?
Which kafka broker image are you using?
In the cruise control pod can you see any kind of error ?

Hi @bartam1

Yes Kafka Cluster is healthy.

I am using the kafka image ghcr.io/banzaicloud/kafka:2.13-3.4.1

No, i don't see any error in cruise control

This log is from kafka-operator

{"level":"error","ts":"2024-02-09T10:22:36.832Z","msg":"failed to get unavailable brokers at rebalance","controller":"CruiseControlTask","controllerGroup":"kafka.banzaicloud.io","controllerKind":"KafkaCluster","KafkaCluster":{"name":"kafka-cluster","namespace":"kafka"},"namespace":"kafka","name": kafka-cluster","reconcileID":"d766cfb9-d909-42c8-b559-e8423eb0f8d7","error":"failed to get list of volumes per broker from Cruise Control: sending HTTP request failed: Get \"http://kafka-cruisecontrol-svc.kafka.svc.cluster.local:8090/kafkacruisecontrol/kafka_cluster_state?json=true&verbose=true\": dial tcp 10.43.23.144:8090: connect: connection refused","errorVerbose":"sending HTTP request failed: Get \"http://kafka-cruisecontrol-svc.kafka.svc.cluster.local:8090/kafkacruisecontrol/kafka_cluster_state?json=true&verbose=true\": dial tcp 10.43.23.144:8090: connect: connection refused\nfailed to get list of volumes per broker from Cruise Control