JahstreetOrg/spark-on-kubernetes-helm

Liveness & rediness checks timeout on Livy

Closed this issue · 2 comments

Currently liveness is being checked on /batches endpoint:
https://github.com/jahstreet/spark-on-kubernetes-helm/blob/a1fd2ac19580feb0d9469c1d7cadd8630710ac13/charts/livy/templates/statefulset.yaml#L33

When there is a bigger number of batches, these check timeout occasionally:

Events:
  Type     Reason     Age                 From                                                    Message
  ----     ------     ----                ----                                                    -------
  Warning  Unhealthy  54m (x56 over 10d)  kubelet, ip-XX  Readiness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  54m (x59 over 10d)  kubelet, ip-XX  Liveness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Would it be ok to add ?size=1 to limit response size, or at least to have an option to disable these checks on livy chart?

Good point, thanks for the mentioning. Will update the chart.

Will be fixed in #39 . Proposed solution is to call /version endpoint instead.