banzaicloud/koperator

Koperator crashes when a broker has no storage configurations set in KafkaCluster

panyuenlau opened this issue · 1 comments

Description

Koperator crashes when users don't have configurations set for any of the brokers under the KafkaCluster CR.

Expected Behavior

KafkaCluster handle the case when any of the brokers don't have storage configurations set.

Actual Behavior

Koperator crashes because of nil pointer dereference:

{"level":"info","ts":"2023-07-21T14:28:21.735Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"KafkaCluster","controllerGroup":"kafka.banzaicloud.io","controllerKind":"KafkaCluster","KafkaCluster":{"name":"test","namespace":"default"},"namespace":"default","name":"test","reconcileID":"c2ced75e-acaf-404d-aa53-ecab89a4f1d5"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1804819]

goroutine 504 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:119 +0x1fa
panic({0x1d09840, 0x3464fd0})
	/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/banzaicloud/koperator/pkg/resources/kafka.(*Reconciler).Reconcile(0xc00056b340, {{0x2375930?, 0xc0013f3cb0?}, 0x1c54020?})
	/workspace/pkg/resources/kafka/kafka.go:251 +0x2419
github.com/banzaicloud/koperator/controllers.(*KafkaClusterReconciler).Reconcile(0xc0003f20a0, {0x2370650, 0xc0013f3d40}, {{{0xc0007ecb20?, 0x10?}, {0xc0007ecb1c?, 0x40da67?}}})
	/workspace/controllers/kafkacluster_controller.go:126 +0x8e3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x2370650?, {0x2370650?, 0xc0013f3d40?}, {{{0xc0007ecb20?, 0x1c58e60?}, {0xc0007ecb1c?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004521e0, {0x23705a8, 0xc0007341c0}, {0x1daf980?, 0xc000e26360?})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323 +0x3a5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004521e0, {0x23705a8, 0xc0007341c0})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:231 +0x333

Affected Version

<= v0.25.1

Steps to Reproduce

  1. Intentionally not provide any storage configurations to any of the brokers in the cluster:
apiVersion: kafka.banzaicloud.io/v1beta1
kind: KafkaCluster
metadata:
  name: test
spec:
  ...
  brokers:
    - id: 0
    - id: 1
  ...
  1. Observe Koperator behavior.

Checklist

Root cause

Koperator expects the broker has the storage configurations set via either brokers[x].storageConfigs or brokers[x].brokerConfigGroup, and it just has a bad assumption that the users would have one of the configurations set

Potential Solutions

  1. When neither of the configuration is provided, Koperator gives a default storage configuration (with PVC) to the broker, e.g:
  • mountPath: "/kafka-logs"
    pvcSpec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
    storage: 10Gi

Note: this might require us to start to start implementing a mutation webhook in Koperator

  1. Handle all the potential nil pointer accesses across the current implementation, and just start the broker with not storage configuration - by default Kafka uses  /tmp/kafka-logs as the log directory, and K8s uses local ephemeral storage for the pod.

Note: ephemeral storage is tied to the lifecycle of a pod, when a pod finishes or is restarted, the storage is cleared out