InfluxGraph/influxgraph

Missing 'measurement' in template configuration causes invalid queries to be generated

julienmary opened this issue · 8 comments

Restarting docker container today, wanted to render : http://localhost:8000/render?from=-1h&target=*.cpu.cpu-total.{usage_user,usage_system}&format=svg

This was working before. But today I'm replied :

Traceback (most recent call last):
  File "/srv/graphite-pypy/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/graphite-pypy/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/graphite-pypy/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/graphite-pypy/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/graphite-pypy/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/srv/graphite-pypy/site-packages/graphite_api/app.py", line 375, in render
    data_store = fetchData(context, paths)
  File "/srv/graphite-pypy/site-packages/graphite_api/render/datalib.py", line 160, in fetchData
    time_info, series = finder.fetch_multi(nodes, startTime, endTime)
  File "/srv/graphite-pypy/site-packages/influxgraph/classes/finder.py", line 495, in fetch_multi
    data = self._run_infl_query(query, paths, measurement_data)
  File "/srv/graphite-pypy/site-packages/influxgraph/classes/finder.py", line 511, in _run_infl_query
    data = self.client.query(query, params=_INFLUXDB_CLIENT_PARAMS)
  File "/srv/graphite-pypy/site-packages/influxdb/client.py", line 347, in query
    in data.get('results', [])
  File "/srv/graphite-pypy/site-packages/influxdb/resultset.py", line 23, in __init__
    raise InfluxDBClientError(self.error)
InfluxDBClientError: invalid measurement

What can be this ?

Probably DB data was reset while saved index was used based on previously existing data.

Will need to see:

  • steps to reproduce
  • debug output from influxgraph

to take this further.

OS Ubuntu 16.04
On host : influxdb 1.2.0
Install the docker image
Inside image : apt-get update / upgrade
Then : pip list --outdated | cut -d' ' -f1 | xargs pip install --upgrade # upgrading everything

graphite-api.yaml :

finders:
  - influxgraph.InfluxDBFinder
influxdb:
  host: dockerhost
  db: telegraf
  templates:
    # Default telegraf agent measurement types
    - "*.diskio. host.measurements.name.field*"
    - "*.disk. host.measurements.path.fstype.field*"
    - "*.cpu. host.measurements.cpu.field*"
    - host.measurements.field*
  memcache:
    host: localhost
log_file: /var/log/influxgraph/influxgraph.log
log_level: debug

http://localhost:8000/metrics/find?query=*.cpu.cpu-total.{usage_user,usage_system} is OK.
Response :
[{"text": "usage_system", "id": ".cpu.cpu-total.usage_system", "allowChildren": 0, "expandable": 0, "leaf": 1}, {"text": "usage_user", "id": ".cpu.cpu-total.usage_user", "allowChildren": 0, "expandable": 0, "leaf": 1}]

The influxgraph log file is never created.

Will need to create the log directory if it does not exist.

Obvious question, is there an influxdb running at dockerhost:8086? Does it contain data?

See also #24 - there are outstanding blockers with influx 1.2.0, not sure if related here but best to stick with 1.1.x for now.

The log directory is created.
There is an influxdb running at dockerhost:8086
Chronograph on the host displays the influxdb data from query : SELECT "usage_user", "usage_system" FROM "telegraf"."autogen"."cpu" WHERE time > now() - 5m

The log file entries are not intended correctly, should be:

influxdb:
    log_file: /var/log/influxgraph/influxgraph.log
    log_level: debug

In any case, have been able to reproduce, the queries look like the below from the docker container:

[DEBUG] 2017-02-23 12:37:44,922 - finder._run_infl_query() - Calling influxdb multi fetch with query - select mean("usage_guest") as "usage_guest" from "" where (time > 1487849864s and time <= 1487853464s) AND (("host" = 'atitude-E6530') AND ("measurements" = 'cpu') AND ("cpu" = 'cpu0')) GROUP BY time(60s), "host", "measurements", "cpu" fill(previous)

Measurement is "" which is obviously invalid.

Weirdly, running it locally outside docker works fine.

Will have to do some debugging on the running instance, something might have changed with queries on the grafana side. Have not been able to reproduce by running the query manually in python interpreter inside the docker container which is even weirder.

Is an issue with template configuration, it should be:

  templates:
    # Default telegraf agent measurement types
    - "*.diskio. host.measurement.name.field*"
    - "*.disk. host.measurement.path.fstype.field*"
    - "*.cpu. host.measurement.cpu.field*"
    - host.measurement.field*

Not host.measurements

The example config at the wiki is correct.

The config should not be accepted without a valid measurement so have updated the title accordingly.

Well ... That did it. Thanks :-)
That's sneaky. I hadn't touched anything since January 20th ... it was working with measurements at that time. I swear :-)

I believe you, no really I do 😃