InfluxGraph/influxgraph

Multiple targets in query at once come back with mixed-up labels

booch opened this issue · 8 comments

booch commented

When I make a request like the following, I get back results with the labels in the same order I requested, but the data comes back in a different order. So the labels are attached to the wrong data.

curl -v 'http://graphite-api:8000/render?
target=scaleToSeconds(nonNegativeDerivative(0769ecf3.*/cpu/total/user),1)&
target=scaleToSeconds(nonNegativeDerivative(0769ecf3.*/cpu/total/system),1)&
target=scaleToSeconds(nonNegativeDerivative(0769ecf3.*/cpu/total/iowait),1)&
target=scaleToSeconds(nonNegativeDerivative(0769ecf3.*/cpu/total/steal),1)&format=json&maxDataPoints=1'

What configuration is being used? Can you show graphite-api.yaml.

There is a test for exactly this for template based configurations.

booch commented

@pkittenis: Here's our graphite-api.yaml:

finders:
  - influxgraph.InfluxDBFinder
influxdb:
  host: dockerhost
  memcache:
    host: localhost

I forgot to mention that this happens in 1.3.0 as well as 1.1.2.

For now, we've got a work-around in place on our client to just do 4 separate queries and combine the results ourselves.

Thanks!

Thanks, that narrows it down to non-template based queries. Not sure when I'll have time to look into that, glad there is a work around. Wild card query should work too as the metrics share a common path.

Could also consider moving to template based configuration as it generally performs better in influx.

ernsy commented

This seems to happen with template based queries too. I have my template set up as:

- "mira_ce.*.timer.land.all.all.* app.host.mtype.measurement.svc.operator.stat"

But for both queries:

target=mira_ce.host1.timer.land.all.all.count &target=mira_ce.host2.timer.land.all.all.count &target=mira_ce.host1.timer.land.all.all.upper_90 &target=mira_ce.host2.timer.land.all.all.upper_90 &from=-5min&until=now&format=json&maxDataPoints=1920

and

target=host*.timer.land.all.all.count &target=host*.timer.land.all.all.upper_90 &from=-5min&until=now&format=json&maxDataPoints=1920

The data returned is in the following order:

mira_ce.host1.timer.land.all.all.count with correct values
mira_ce.host2.timer.land.all.all.count with values for mira_ce.host1.timer.land.all.all.upper_90
mira_ce.host1.timer.land.all.all.upper_90 with values for mira_ce.host2.timer.land.all.all.count
mira_ce.host2.timer.land.all.all.upper_90 with the correct values.

I also noted that when setting up my template as:

 - "mira_ce.host1.timer.land.all.all.* app.host.mtype.measurement.svc.operator.stat"
 - "mira_ce.host2.timer.land.all.all.* app.host.mtype.measurement.svc.operator.stat"

It works fine.

Just a side note: there are 9 different values for the stat tag and 6 different values for the host tag

Firstly and in general, prefer one target instead of multiple if there is one query that satisfies all target paths. That is the case for all the above queries, including OP (0769ecf3.*.cpu.total.*)

Single target queries are faster. Multiple targets means multiple metrics/find queries to gather nodes and response time will go up as number of paths increases.

target=mira_ce.hsc185.timer.land.all.all.count &target=mira_ce.hsc186.timer.land.all.all.count &target=mira_ce.hsc185.timer.land.all.all.upper_90 &target=mira_ce.hsc186.timer.land.all.all.upper_90

Can be written as target=mira_ce.*.timer.land.all.all.{count,upper_90} or target=mira_ce.{hsc185,hsc186}.timer.land.all.all.{count,upper_90}

target=hsc18*.timer.land.all.all.count &target=hsc18*.timer.land.all.all.upper_90 can also be written as target=hsc18*.timer.land.all.all.{count,upper_90}

The above queries have correct data.

Using field in template will also have correct data, eg:

- "mira_ce.hsc185.timer.land.all.all.* app.host.mtype.measurement.svc.operator.field"

where stat is made into a field instead of a tag. Without a named field, all field names are called value which makes it more likely there will be conflicts like this, along with making aggregation configuration apply to all metrics regardless of path (see readme).

The issue here is order of targets was used as-is by influxgraph while influx data order is sorted. For the template case, there is a fix in place in 1.3.4 and a test to replicate the above. Thanks for raising 👍

For the OP and non-templated data I haven't been able to replicate and as far as I can see when not using templates order does not matter as data is retrieved by name.

However, as (a) a fix has been released for the template case where the issue was replicated, (b) there are at least two workarounds for OP and (c) multiple query targets where a single target with wildcards would be best doesn't make a lot of sense, I'm inclined to close it.

Can re-open if @booch can show how to replicate.

booch commented

I'm no longer on the project that had the issue.

Hey @cgeers, can you try upgrading influxgraph to 1.3.4 and reverting the work-around for this, and see if it's fixed?

ernsy commented

Thanks @pkittenis for the insights, I can confirm the fix solved the issue for me

@booch thanks for the shoutout. This indeed had the desired effect.