Parsing error
azelezni opened this issue · 6 comments
I'm unable to send any metrics to prometheus pushgateway, getting the following error:
2019-01-16 08:16:41 INFO PrometheusSink:54 - metricsNamespace=None, sparkAppName=None, sparkAppId=None, executorId=None
2019-01-16 08:16:41 INFO PrometheusSink:54 - role=shuffle, job=shuffle
2019-01-16 08:16:41 INFO PushGatewayWithTimestamp:217 - Sending metrics data to 'http://fkpr-prometheus-pushgateway.fkpr:9091/metrics/job/shuffle/role/shuffle'
2019-01-16 08:16:41 INFO PushGatewayWithTimestamp:247 - Error response from http://fkpr-prometheus-pushgateway.fkpr:9091/metrics/job/shuffle/role/shuffle
2019-01-16 08:16:41 INFO PushGatewayWithTimestamp:250 - text format parsing error in line 244: second HELP line for metric name "HiveExternalCatalog_fileCacheHits"
2019-01-16 08:16:41 ERROR PushGatewayWithTimestamp:255 - Sending metrics failed due to:
java.io.IOException: Response code from http://fkpr-prometheus-pushgateway.fkpr:9091/metrics/job/shuffle/role/shuffle was 400
at com.banzaicloud.metrics.prometheus.client.exporter.PushGatewayWithTimestamp.doRequest(PushGatewayWithTimestamp.java:252)
at com.banzaicloud.metrics.prometheus.client.exporter.PushGatewayWithTimestamp.pushAdd(PushGatewayWithTimestamp.java:168)
at com.banzaicloud.spark.metrics.sink.PrometheusSink$Reporter.report(PrometheusSink.scala:122)
at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Hi @azelezni , can you share your metrics-name-capture-regex
, name-replacement
settings of your PrometheusSink config?
What version of spark-metrics library are you using?
Hi @stoader, I'm using spark 2.3.2 with the latest spark-metrics release,
I was able to get around this using the following metrics.properties
*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
*.sink.prometheus.pushgateway-address-protocol=http
*.sink.prometheus.pushgateway-address=fkpr-prometheus-pushgateway.fkpr:9091
*.sink.prometheus.period=10
*.sink.prometheus.unit=seconds
*.sink.prometheus.pushgateway-enable-timestamp=false
*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=false
master.sink.prometheus.metrics-name-capture-regex=(.*)
master.sink.prometheus.metrics-name-replacement=master_$1
worker.sink.prometheus.metrics-name-capture-regex=(.*)
worker.sink.prometheus.metrics-name-replacement=worker_$1
executor.sink.prometheus.metrics-name-capture-regex=(.*)
executor.sink.prometheus.metrics-name-replacement=executor_$1
driver.sink.prometheus.metrics-name-capture-regex=(.*)
driver.sink.prometheus.metrics-name-replacement=driver_$1
applications.sink.prometheus.metrics-name-capture-regex=(.*)
applications.sink.prometheus.metrics-name-replacement=app_$1
@azelezni can you provide the metrics.properties that repros the issue with?
Note that the exception above was published from shuffle service
2019-01-16 08:16:41 INFO PrometheusSink:54 - metricsNamespace=None, sparkAppName=None, sparkAppId=None, executorId=None
2019-01-16 08:16:41 INFO PrometheusSink:54 - role=shuffle, job=shuffle
Are you running external shuffle service as well?
If not than the reason why metrics are being reported as coming from shuffle is that spark-metrics is currently prepared for spark jobs where metrics are published from driver, executor and shuffle service and not prepared for standalone spark depoyments (see https://github.com/banzaicloud/spark-metrics/blob/2.3-2.0.4/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L79)
The following metrics.properties causes the error:
*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
*.sink.prometheus.pushgateway-address-protocol=http
*.sink.prometheus.pushgateway-address=fkpr-prometheus-pushgateway.fkpr:9091
*.sink.prometheus.period=10
*.sink.prometheus.unit=seconds
*.sink.prometheus.pushgateway-enable-timestamp=false
*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=false
Yes I'm running spark standalone, however I think that with the regex replacement it's good enough for my needs.
Can you do re-run with debug log level enabled ? That would log the payload being sent to push gateway.
Closing this issue. Please re-open if this issue surfaces again