banzaicloud/spark-metrics

Push dropwizard metrics error, PushGatewayWithTimestamp: text format parsing error in line 64: second HELP line for metric name "HiveExternalCatalog_fileCacheHits"

kangtiann opened this issue · 18 comments

Describe the bug

Spark version: 2.4.3
spark-metrics version: spark-metrics_2.11-2.3-2.1.1.jar

Error in spark master.log:

PushGatewayWithTimestamp: text format parsing error in line 64: second HELP line for metric name "HiveExternalCatalog_fileCacheHits"

Reason is: pushfateway may not accept duplicate metrics.

here is push request body (while logging with debug level)

# HELP HiveExternalCatalog_fileCacheHits Generated from Dropwizard metric import (metric=HiveExternalCatalog.fileCacheHits, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_fileCacheHits gauge
HiveExternalCatalog_fileCacheHits 0.0
# HELP HiveExternalCatalog_filesDiscovered Generated from Dropwizard metric import (metric=HiveExternalCatalog.filesDiscovered, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_filesDiscovered gauge
HiveExternalCatalog_filesDiscovered 0.0
# HELP HiveExternalCatalog_hiveClientCalls Generated from Dropwizard metric import (metric=HiveExternalCatalog.hiveClientCalls, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_hiveClientCalls gauge
HiveExternalCatalog_hiveClientCalls 0.0
# HELP HiveExternalCatalog_parallelListingJobCount Generated from Dropwizard metric import (metric=HiveExternalCatalog.parallelListingJobCount, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_parallelListingJobCount gauge
HiveExternalCatalog_parallelListingJobCount 0.0
# HELP HiveExternalCatalog_partitionsFetched Generated from Dropwizard metric import (metric=HiveExternalCatalog.partitionsFetched, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_partitionsFetched gauge
HiveExternalCatalog_partitionsFetched 0.0
# HELP CodeGenerator_compilationTime Generated from Dropwizard metric import (metric=CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_compilationTime summary
CodeGenerator_compilationTime{quantile="0.5"} 0.0
CodeGenerator_compilationTime{quantile="0.75"} 0.0
CodeGenerator_compilationTime{quantile="0.95"} 0.0
CodeGenerator_compilationTime{quantile="0.98"} 0.0
CodeGenerator_compilationTime{quantile="0.99"} 0.0
CodeGenerator_compilationTime{quantile="0.999"} 0.0
CodeGenerator_compilationTime_count 0.0
# HELP CodeGenerator_generatedClassSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedClassSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedClassSize summary
CodeGenerator_generatedClassSize{quantile="0.5"} 0.0
CodeGenerator_generatedClassSize{quantile="0.75"} 0.0
CodeGenerator_generatedClassSize{quantile="0.95"} 0.0
CodeGenerator_generatedClassSize{quantile="0.98"} 0.0
CodeGenerator_generatedClassSize{quantile="0.99"} 0.0
CodeGenerator_generatedClassSize{quantile="0.999"} 0.0
CodeGenerator_generatedClassSize_count 0.0
# HELP CodeGenerator_generatedMethodSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedMethodSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedMethodSize summary
CodeGenerator_generatedMethodSize{quantile="0.5"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.75"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.95"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.98"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.99"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.999"} 0.0
CodeGenerator_generatedMethodSize_count 0.0
# HELP CodeGenerator_sourceCodeSize Generated from Dropwizard metric import (metric=CodeGenerator.sourceCodeSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_sourceCodeSize summary
CodeGenerator_sourceCodeSize{quantile="0.5"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.75"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.95"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.98"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.99"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.999"} 0.0
CodeGenerator_sourceCodeSize_count 0.0
# HELP master_aliveWorkers Generated from Dropwizard metric import (metric=master.aliveWorkers, type=org.apache.spark.deploy.master.MasterSource$$anon$2)
# TYPE master_aliveWorkers gauge
master_aliveWorkers 1.0
# HELP master_apps Generated from Dropwizard metric import (metric=master.apps, type=org.apache.spark.deploy.master.MasterSource$$anon$3)
# TYPE master_apps gauge
master_apps 0.0
# HELP master_waitingApps Generated from Dropwizard metric import (metric=master.waitingApps, type=org.apache.spark.deploy.master.MasterSource$$anon$4)
# TYPE master_waitingApps gauge
master_waitingApps 0.0
# HELP master_workers Generated from Dropwizard metric import (metric=master.workers, type=org.apache.spark.deploy.master.MasterSource$$anon$1)
# TYPE master_workers gauge
master_workers 1.0
# HELP HiveExternalCatalog_fileCacheHits Generated from Dropwizard metric import (metric=HiveExternalCatalog.fileCacheHits, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_fileCacheHits gauge
HiveExternalCatalog_fileCacheHits 0.0
# HELP HiveExternalCatalog_filesDiscovered Generated from Dropwizard metric import (metric=HiveExternalCatalog.filesDiscovered, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_filesDiscovered gauge
HiveExternalCatalog_filesDiscovered 0.0
# HELP HiveExternalCatalog_hiveClientCalls Generated from Dropwizard metric import (metric=HiveExternalCatalog.hiveClientCalls, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_hiveClientCalls gauge
HiveExternalCatalog_hiveClientCalls 0.0
# HELP HiveExternalCatalog_parallelListingJobCount Generated from Dropwizard metric import (metric=HiveExternalCatalog.parallelListingJobCount, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_parallelListingJobCount gauge
HiveExternalCatalog_parallelListingJobCount 0.0
# HELP HiveExternalCatalog_partitionsFetched Generated from Dropwizard metric import (metric=HiveExternalCatalog.partitionsFetched, type=com.codahale.metrics.Counter)
# TYPE HiveExternalCatalog_partitionsFetched gauge
HiveExternalCatalog_partitionsFetched 0.0
# HELP CodeGenerator_compilationTime Generated from Dropwizard metric import (metric=CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_compilationTime summary
CodeGenerator_compilationTime{quantile="0.5"} 0.0
CodeGenerator_compilationTime{quantile="0.75"} 0.0
CodeGenerator_compilationTime{quantile="0.95"} 0.0
CodeGenerator_compilationTime{quantile="0.98"} 0.0
CodeGenerator_compilationTime{quantile="0.99"} 0.0
CodeGenerator_compilationTime{quantile="0.999"} 0.0
CodeGenerator_compilationTime_count 0.0
# HELP CodeGenerator_generatedClassSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedClassSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedClassSize summary
CodeGenerator_generatedClassSize{quantile="0.5"} 0.0
CodeGenerator_generatedClassSize{quantile="0.75"} 0.0
CodeGenerator_generatedClassSize{quantile="0.95"} 0.0
CodeGenerator_generatedClassSize{quantile="0.98"} 0.0
CodeGenerator_generatedClassSize{quantile="0.99"} 0.0
CodeGenerator_generatedClassSize{quantile="0.999"} 0.0
CodeGenerator_generatedClassSize_count 0.0
# HELP CodeGenerator_generatedMethodSize Generated from Dropwizard metric import (metric=CodeGenerator.generatedMethodSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_generatedMethodSize summary
CodeGenerator_generatedMethodSize{quantile="0.5"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.75"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.95"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.98"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.99"} 0.0
CodeGenerator_generatedMethodSize{quantile="0.999"} 0.0
CodeGenerator_generatedMethodSize_count 0.0
# HELP CodeGenerator_sourceCodeSize Generated from Dropwizard metric import (metric=CodeGenerator.sourceCodeSize, type=com.codahale.metrics.Histogram)
# TYPE CodeGenerator_sourceCodeSize summary
CodeGenerator_sourceCodeSize{quantile="0.5"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.75"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.95"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.98"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.99"} 0.0
CodeGenerator_sourceCodeSize{quantile="0.999"} 0.0
CodeGenerator_sourceCodeSize_count 0.0

Steps to reproduce the issue:

Expected behavior

Screenshots

image

Additional context

I will commit PR to fix this.

That's a warning from Prometheus Pushgatway which doesn't accept two metrics instances that have the same keys but the different help message. This is to do with how the DropWizzard Prometheus exporter generates help messages for metrics. In the latest version, this should not happen as the help string is set to a fixed message now https://github.com/banzaicloud/spark-metrics/blob/master/src/main/scala/com/banzaicloud/spark/metrics/DropwizardExports.scala#L32 thus two instances of the same metric won't have different help message strings. Are you sure that you're running the latest version?

I build from master.

image

Can you check if the correct jar (you built from master branch) is picked up by your Spark deployment and not an older cached version? In the latest version as I mentioned before the HELP string fixed to Generated from Dropwizard metric import. Help strings like the ones in your log HiveExternalCatalog_fileCacheHits Generated from Dropwizard metric import (metric=HiveExternalCatalog.fileCacheHits, type=com.codahale.metrics.Counter) were generated by an earlier version of the jar.

Why help message not fixed to Generated from Dropwizard metric import?

I think class DropwizardExports import from wrong path.

image

After, still not work...

image

image

I think should remove duplicate metric like this:

image

I pushed a fix, can you take the latest master an try again?

Still not work, version of my pushgateway is v1.0.0

image

image

Can you show the code snippet in your version of spark-metrics that is at PrometheusSink.scala:237 ?

Help message is ok now, but still not work.

What error do you see now?

Error same to early
If remove duplicate metrics, everything is okey~

image

Can you describe the steps to reproduce the error so as we can repro it on our dev environment?

Here is my docker compose: prometheus-docker.zip

STEP 1: Start prometheus and pushgateway

docker-compuse up

STEP 2: spark 2.4.4, edit metrics.properties

# Enable Prometheus for all instances by class name
*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
# Prometheus pushgateway address
*.sink.prometheus.pushgateway-address-protocol=http
*.sink.prometheus.pushgateway-address=127.0.0.1:9091
*.sink.prometheus.period=10
*.sink.prometheus.unit=seconds
*.sink.prometheus.pushgateway-enable-timestamp=false
## Metrics name processing (version 2.3-1.1.0 +)
#*.sink.prometheus.metrics-name-capture-regex=<regular expression to capture sections metric name sections to be replaces>
#*.sink.prometheus.metrics-name-replacement=<replacement captured sections to be replaced with>
#*.sink.prometheus.labels=<labels in label=value format separated by comma>
# Support for JMX Collector (version 2.3-2.0.0 +)
*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=false
#*.sink.prometheus.jmx-collector-config=/opt/spark/conf/monitoring/jmxCollector.yaml

# Enable HostName in Instance instead of Appid (Default value is false i.e. instance=${appid})
#*.sink.prometheus.enable-hostname-in-instance=true

# Enable JVM metrics source for all instances by class name
#*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
#*.source.jvm.class=org.apache.spark.metrics.source.JvmSource

STEP 3: copy spark-metrics dependence jars to spark's jar path

STEP 4: Start spark with standalone mode.

./sbin/start-master.sh
./sbin/start-slave.sh spark://XXXXXXXXX:7077

@kangtiann I couldn't reproduce this issue.

I downloaded the spark-metrics jar and it's dependencied using mvn dependency:get -DgroupId=com.banzaicloud -DartifactId=spark-metrics_2.11 -Dversion=2.3-2.1.2 command.

Then I copied the downloaded jars to spark's jar path cp ~/.m2/repository/com/banzaicloud/spark-metrics_2.11/2.3-2.1.2/spark-metrics_2.11-2.3-2.1.2.jar ssembly/target/scala-2.11/jars/

Also, I used the metrics.properties that you just provided above.

Can you create a docker-compose that also start spar-master and spark-slave with the jars yu use included where this issue is reproducible?

Here is docker compose contains spark and spark-metrics jars (only spark master) prometheus-docker.zip

Spark docker file

image

Error message

image

Can re-try with the latest master?

Note you need to compile spark-metrics with sbt ++2.11.12 package for Scala 2.11 (your docker-compose setup uses Scala 2.11).

Also, the package of the Sink has changed thus use *.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink in your metrics.properties file