Push dropwizard metrics error, PushGatewayWithTimestamp: text format parsing error in line 64: second HELP line for metric name "HiveExternalCatalog_fileCacheHits"
kangtiann opened this issue · 18 comments
Describe the bug
Spark version: 2.4.3
spark-metrics version: spark-metrics_2.11-2.3-2.1.1.jar
Error in spark master.log:
PushGatewayWithTimestamp: text format parsing error in line 64: second HELP line for metric name "HiveExternalCatalog_fileCacheHits"
Reason is: pushfateway may not accept duplicate metrics.
here is push request body (while logging with debug level)
# HELP HiveExternalCatalog_fileCacheHits Generated from Dropwizard metric import (metric=HiveExternalCatalog.fileCacheHits, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_fileCacheHits gauge
HiveExternalCatalog_fileCacheHits 0.0
# HELP HiveExternalCatalog_filesDiscovered Generated from Dropwizard metric import (metric=HiveExternalCatalog.filesDiscovered, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_filesDiscovered gauge
HiveExternalCatalog_filesDiscovered 0.0
# HELP HiveExternalCatalog_hiveClientCalls Generated from Dropwizard metric import (metric=HiveExternalCatalog.hiveClientCalls, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_hiveClientCalls gauge
HiveExternalCatalog_hiveClientCalls 0.0
# HELP HiveExternalCatalog_parallelListingJobCount Generated from Dropwizard metric import (metric=HiveExternalCatalog.parallelListingJobCount, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_parallelListingJobCount gauge
HiveExternalCatalog_parallelListingJobCount 0.0
# HELP HiveExternalCatalog_partitionsFetched Generated from Dropwizard metric import (metric=HiveExternalCatalog.partitionsFetched, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_partitionsFetched gauge
HiveExternalCatalog_partitionsFetched 0.0
# HELP CodeGenerator_compilationTime Generated from Dropwizard metric import (metric=CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_compilationTime summary
CodeGenerator_compilationTime{quantile="0.5"} 0.0
CodeGenerator_compilationTime{quantile="0.75"} 0.0
CodeGenerator_compilationTime{quantile="0.95"} 0.0
CodeGenerator_compilationTime{quantile="0.98"} 0.0
CodeGenerator_compilationTime{quantile="0.99"} 0.0
CodeGenerator_compilationTime{quantile="0.999"} 0.0
CodeGenerator_compilationTime_count 0.0
# HELP CodeGenerator_generatedClassSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedClassSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedClassSize summary
CodeGenerator_generatedClassSize{quantile="0.5"} 0.0
CodeGenerator_generatedClassSize{quantile="0.75"} 0.0
CodeGenerator_generatedClassSize{quantile="0.95"} 0.0
CodeGenerator_generatedClassSize{quantile="0.98"} 0.0
CodeGenerator_generatedClassSize{quantile="0.99"} 0.0
CodeGenerator_generatedClassSize{quantile="0.999"} 0.0
CodeGenerator_generatedClassSize_count 0.0
# HELP CodeGenerator_generatedMethodSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedMethodSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedMethodSize summary
CodeGenerator_generatedMethodSize{quantile="0.5"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.75"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.95"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.98"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.99"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.999"} 0.0
CodeGenerator_generatedMethodSize_count 0.0
# HELP CodeGenerator_sourceCodeSize Generated from Dropwizard metric import (metric=CodeGenerator.sourceCodeSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_sourceCodeSize summary
CodeGenerator_sourceCodeSize{quantile="0.5"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.75"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.95"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.98"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.99"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.999"} 0.0
CodeGenerator_sourceCodeSize_count 0.0
# HELP master_aliveWorkers Generated from Dropwizard metric import (metric=master.aliveWorkers, type=org.apache.spark.deploy.master.MasterSource$$anon$2)
# TYPE master_aliveWorkers gauge
master_aliveWorkers 1.0
# HELP master_apps Generated from Dropwizard metric import (metric=master.apps, type=org.apache.spark.deploy.master.MasterSource$$anon$3)
# TYPE master_apps gauge
master_apps 0.0
# HELP master_waitingApps Generated from Dropwizard metric import (metric=master.waitingApps, type=org.apache.spark.deploy.master.MasterSource$$anon$4)
# TYPE master_waitingApps gauge
master_waitingApps 0.0
# HELP master_workers Generated from Dropwizard metric import (metric=master.workers, type=org.apache.spark.deploy.master.MasterSource$$anon$1)
# TYPE master_workers gauge
master_workers 1.0
# HELP HiveExternalCatalog_fileCacheHits Generated from Dropwizard metric import (metric=HiveExternalCatalog.fileCacheHits, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_fileCacheHits gauge
HiveExternalCatalog_fileCacheHits 0.0
# HELP HiveExternalCatalog_filesDiscovered Generated from Dropwizard metric import (metric=HiveExternalCatalog.filesDiscovered, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_filesDiscovered gauge
HiveExternalCatalog_filesDiscovered 0.0
# HELP HiveExternalCatalog_hiveClientCalls Generated from Dropwizard metric import (metric=HiveExternalCatalog.hiveClientCalls, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_hiveClientCalls gauge
HiveExternalCatalog_hiveClientCalls 0.0
# HELP HiveExternalCatalog_parallelListingJobCount Generated from Dropwizard metric import (metric=HiveExternalCatalog.parallelListingJobCount, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_parallelListingJobCount gauge
HiveExternalCatalog_parallelListingJobCount 0.0
# HELP HiveExternalCatalog_partitionsFetched Generated from Dropwizard metric import (metric=HiveExternalCatalog.partitionsFetched, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_partitionsFetched gauge
HiveExternalCatalog_partitionsFetched 0.0
# HELP CodeGenerator_compilationTime Generated from Dropwizard metric import (metric=CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_compilationTime summary
CodeGenerator_compilationTime{quantile="0.5"} 0.0
CodeGenerator_compilationTime{quantile="0.75"} 0.0
CodeGenerator_compilationTime{quantile="0.95"} 0.0
CodeGenerator_compilationTime{quantile="0.98"} 0.0
CodeGenerator_compilationTime{quantile="0.99"} 0.0
CodeGenerator_compilationTime{quantile="0.999"} 0.0
CodeGenerator_compilationTime_count 0.0
# HELP CodeGenerator_generatedClassSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedClassSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedClassSize summary
CodeGenerator_generatedClassSize{quantile="0.5"} 0.0
CodeGenerator_generatedClassSize{quantile="0.75"} 0.0
CodeGenerator_generatedClassSize{quantile="0.95"} 0.0
CodeGenerator_generatedClassSize{quantile="0.98"} 0.0
CodeGenerator_generatedClassSize{quantile="0.99"} 0.0
CodeGenerator_generatedClassSize{quantile="0.999"} 0.0
CodeGenerator_generatedClassSize_count 0.0
# HELP CodeGenerator_generatedMethodSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedMethodSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedMethodSize summary
CodeGenerator_generatedMethodSize{quantile="0.5"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.75"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.95"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.98"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.99"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.999"} 0.0
CodeGenerator_generatedMethodSize_count 0.0
# HELP CodeGenerator_sourceCodeSize Generated from Dropwizard metric import (metric=CodeGenerator.sourceCodeSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_sourceCodeSize summary
CodeGenerator_sourceCodeSize{quantile="0.5"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.75"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.95"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.98"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.99"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.999"} 0.0
CodeGenerator_sourceCodeSize_count 0.0
Steps to reproduce the issue:
Expected behavior
Screenshots
Additional context
I will commit PR to fix this.
That's a warning from Prometheus Pushgatway which doesn't accept two metrics instances that have the same keys but the different help message. This is to do with how the DropWizzard Prometheus exporter generates help messages for metrics. In the latest version, this should not happen as the help string is set to a fixed message now https://github.com/banzaicloud/spark-metrics/blob/master/src/main/scala/com/banzaicloud/spark/metrics/DropwizardExports.scala#L32 thus two instances of the same metric won't have different help message strings. Are you sure that you're running the latest version?
The second help comment may be error.
Can you check if the correct jar (you built from master branch) is picked up by your Spark deployment and not an older cached version? In the latest version as I mentioned before the HELP string fixed to Generated from Dropwizard metric import
. Help strings like the ones in your log HiveExternalCatalog_fileCacheHits Generated from Dropwizard metric import (metric=HiveExternalCatalog.fileCacheHits, type=com.codahale.metrics.Counter)
were generated by an earlier version of the jar.
I pushed a fix, can you take the latest master an try again?
Can you show the code snippet in your version of spark-metrics that is at PrometheusSink.scala:237 ?
Help message is ok now, but still not work.
What error do you see now?
Can you describe the steps to reproduce the error so as we can repro it on our dev environment?
Here is my docker compose: prometheus-docker.zip
STEP 1: Start prometheus and pushgateway
docker-compuse up
STEP 2: spark 2.4.4, edit metrics.properties
# Enable Prometheus for all instances by class name
*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
# Prometheus pushgateway address
*.sink.prometheus.pushgateway-address-protocol=http
*.sink.prometheus.pushgateway-address=127.0.0.1:9091
*.sink.prometheus.period=10
*.sink.prometheus.unit=seconds
*.sink.prometheus.pushgateway-enable-timestamp=false
## Metrics name processing (version 2.3-1.1.0 +)
#*.sink.prometheus.metrics-name-capture-regex=<regular expression to capture sections metric name sections to be replaces>
#*.sink.prometheus.metrics-name-replacement=<replacement captured sections to be replaced with>
#*.sink.prometheus.labels=<labels in label=value format separated by comma>
# Support for JMX Collector (version 2.3-2.0.0 +)
*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=false
#*.sink.prometheus.jmx-collector-config=/opt/spark/conf/monitoring/jmxCollector.yaml
# Enable HostName in Instance instead of Appid (Default value is false i.e. instance=${appid})
#*.sink.prometheus.enable-hostname-in-instance=true
# Enable JVM metrics source for all instances by class name
#*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
#*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
STEP 3: copy spark-metrics
dependence jars to spark's jar path
STEP 4: Start spark with standalone mode.
./sbin/start-master.sh
./sbin/start-slave.sh spark://XXXXXXXXX:7077
@kangtiann I couldn't reproduce this issue.
I downloaded the spark-metrics jar and it's dependencied using mvn dependency:get -DgroupId=com.banzaicloud -DartifactId=spark-metrics_2.11 -Dversion=2.3-2.1.2
command.
Then I copied the downloaded jars to spark's jar path cp ~/.m2/repository/com/banzaicloud/spark-metrics_2.11/2.3-2.1.2/spark-metrics_2.11-2.3-2.1.2.jar ssembly/target/scala-2.11/jars/
Also, I used the metrics.properties that you just provided above.
Can you create a docker-compose that also start spar-master and spark-slave with the jars yu use included where this issue is reproducible?
Here is docker compose contains spark and spark-metrics jars (only spark master) prometheus-docker.zip
Spark docker file
Error message
Can re-try with the latest master?
Note you need to compile spark-metrics with sbt ++2.11.12 package
for Scala 2.11 (your docker-compose setup uses Scala 2.11).
Also, the package of the Sink has changed thus use *.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
in your metrics.properties file