Issues when forced to rebuild corrupted index
stigok opened this issue · 1 comments
stigok commented
Found an issue with kafka nodes are coming up again after having failed. If they wake up to a corrupted index, it will attempt to fix itself. This seems to have two major implications:
- Memory consumptions goes out the roof, resulting in pod getting killed due to OOM (due to limits, of course)
- Readiness probe fails, and will kill the pod if it hasn't already OOM'ed
Any thoughts on how to remedy this?
solsson commented
We didn't get around to implementing it yet, but the idea is to fix the default image to support the ./kafka-server-stop.sh
command (solsson/dockerfiles@4fb7b5d) and to use a preStop
pod lifecycle hook to invoke it.