banzaicloud/spark-metrics

Spark Metrics Stop Pushing After Pushgateway Restarts

julianblack opened this issue · 0 comments

I've been having some problems with my Prometheus Pushgateway in Kubernetes resulting in the pod to restart. I noticed after restart, many of the metrics stop pushing to the Pushgateway from the Spark-Prometheus plugin.

Steps to reproduce the issue:
This should be easily reproducible by restarting the Prometheus Pushgateway when an active Spark app is pushing metrics.

Expected behavior
I would expect that when the Prometheus Pushgateway goes down, the plugin would log an error message. Once it comes back up, it should start pushing to the Pushgateway again. Since these are HTTP requests, and not maintaining a long-running connection.

Is there a way to achieve this behavior? I appreciate the help.

Screenshot, Pushgateway after restart (Notice only 3 nodes are pushing):

image