Yolean/kubernetes-kafka

Issues when forced to rebuild corrupted index

stigok opened this issue · 1 comments

Found an issue with kafka nodes are coming up again after having failed. If they wake up to a corrupted index, it will attempt to fix itself. This seems to have two major implications:

  • Memory consumptions goes out the roof, resulting in pod getting killed due to OOM (due to limits, of course)
  • Readiness probe fails, and will kill the pod if it hasn't already OOM'ed

Any thoughts on how to remedy this?

We didn't get around to implementing it yet, but the idea is to fix the default image to support the ./kafka-server-stop.sh command (solsson/dockerfiles@4fb7b5d) and to use a preStop pod lifecycle hook to invoke it.