banzaicloud/spark-metrics

pushgateway info log message to "Fix pushed metrics!"

andrewgdavis opened this issue · 2 comments

Prometheus reports that the help message for some metrics are in conflict. For example the driver and executor help message conflict from the logs:

time="2019-05-08T22:16:33Z" level=info msg="Metric families 
'name:\"java_lang_MemoryPool_CollectionUsage_used\" help:\"java.lang.management.MemoryUsage (java.lang<type=MemoryPool, name=Tenured Gen><CollectionUsage>used)\" type:UNTYPED 
### driver metrics collected for java_lang_MemoryPool_CollectionUsage_used 
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"Tenured Gen\" > label:<name:\"role\" value:\"driver\" > untyped:<value:6.2114752e+07 > > 
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"Survivor Space\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.433232e+06 > > 
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"Eden Space\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > ' 
and 'name:\"java_lang_MemoryPool_CollectionUsage_used\" help:\"java.lang.management.MemoryUsage (java.lang<type=MemoryPool, name=PS Old Gen><CollectionUsage>used)\" type:UNTYPED 
### sample of executor metrics for java_lang_MemoryPool_CollectionUsage_used
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"PS Old Gen\" > label:<name:\"number\" value:\"5\" > label:<
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"PS Eden Space\" > label:<name:\"number\" value:\"5\" > labe
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"PS Survivor Space\" > label:<name:\"number\" value:\"5\" > 
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"PS Old Gen\" > label:<name:\"number\" value:\"6\" > label:<
metric:<label:<name:\"app_name\" value:\"\" > label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"recent\" > label:<name:\"name\" value:\"PS Eden Space\" > label:<name:\"number\" value:\"6\" > labe
have inconsistent help strings. The latter will have priority. This is bad. Fix your pushed metrics!" source="diskmetricstore.go:126"

java_lang_MemoryPool_CollectionUsage_used help message should be the same for the driver and executor... but it looks like it is used 2x: one for "Tenured Gen" help and one for "PS Old Gen". Is there an updated lib or config that can help address this?

@andrewgdavis the Pushgateway doesn't like metrics with the same name that has different help message. To my knowledge it doesn't drop the metric itself it just warns about the difference in the help message and will pick one of the two help messages.

The help message is generated by the dropwizzard exporter library and way it does it may lead to help message inconsistencies.

Starting from spark-metrics 2.3-2.0.1 the help messages generated by dropwizzard lib are overwritten to the static message Generated from Dropwizard metric import just before pushing the metrics to Pushgateway.

Could you upgrade to spark-metrics 2.3-2.0.4 and retry.

Thanks for the reply. The spark-metrics_2.11-2.3-2.0.4.jar was already being used. perhaps i need to see if the dropwizard exporter needs to be updated.