prometheus/graphite_exporter

New glob matcher report invalid values.

kien-truong opened this issue · 5 comments

We're testing both 0.5.0 and 0.4.2 and getting errors with the new glob matcher.

With 0.5.0, graphite exporter error out with duplicated metrics error

collected metric "flink_assigned_partitions" { label:<name:"job" value:"ebs-dumper" > label:<name:"operator" value:"Source:-kafka:-ebs-umts" > label:<name:"subtask" value:"0" > label:<name:"task" value:"Source:-kafka:-ebs-umts-->-Process-ebs-umts-events" > gauge:<value:487941.72 > } was collected before with the same name and label values

The weird part is we don't know where the value 487941.72 comes from. We never sent that value and it didn't show up in the debug log either.

With version 0.4.2, the graphite_exporter doesn't error, but reports both valid and invalid values, which screws up the monitoring process.

flink_assigned_partitions{job="ebs-dumper",operator="Source:-kafka:-ebs-lte",subtask="2",task="Source:-kafka:-ebs-lte-->-Process-ebs-lte-events"} 5
flink_assigned_partitions{job="ebs-dumper",operator="Source:-kafka:-ebs-lte",subtask="2",task="Source:-kafka:-ebs-lte-->-Process-ebs-lte-events"} 1.80102799e+06

Reverting back to 0.3.0 or using the regex matcher will avoid this issue.

This might relate to issues #79

Apology for the delay. The bug is a bit hard to reproduce reliably. You might have to run issue90_test.go a couple of time to trigger it.

https://github.com/kien-truong/graphite_exporter/tree/bug-90/e2e

+1, Facing the same issue with a big dataset.

ha, thank you, the repro case is really useful. One thing I observe is that you are sending each line individually over UDP. I'm going to try and put this into the test proper, but in some manual testing I found a data race when pushing metrics in parallel that doesn't happen when I dump them all in over one TCP connection:

==================
WARNING: DATA RACE
Write at 0x00c000190100 by goroutine 40:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Previous write at 0x00c000190100 by goroutine 39:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Goroutine 40 (running) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db

Goroutine 39 (finished) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db
==================

I think we're missing some serialization around accesses to mapper.GetMapping; in the statsd exporter this was already the case.

I have a potential fix in #94.