trbs/bucky

InfluxDB Support

Closed this issue · 5 comments

While InfluxDB supports the carbon protocol and can act as a carbon server, native InfluxDB support would be much appreciated. The client code would probably have to be factored out into a interface/module architecture first though.

+1

+1

I've done some work on adding preliminary support for InfluxDB in master...dimrozakis:influxdb .

While carbon saves metrics in whisper files identified by a dot separated name, InfluxDB is a lot more flexible. You can chose the database, measurement name (think db table), and an arbitrary name of key-value tags or fields (think indexed and not indexed table columns) to store each sample.

The most naive way (current implementation) is to store all series in the same preconfigured database, using the sample dot separated name as the measurement/table name and without using any tags or fields.

Another approach would be to change the format of the data passed from bucky's servers to the clients, so instead of a (host, time, value, time) tuple, for each sample the server would send an arbitrary set of key-value pairs (this could be 'plugin', 'plugin_instance', 'type', 'type_instance' etc for collectd, I don't know if something like that would also apply to metricsd and statsd servers. Clients that support key-value pairs (like InfluxDBClient) would use them as tags, whereas clients that need a single string for the series name would have to serialize them and produce the name.

A different approach would be to use a scheme like the one used by the graphite plugin of InfluxDB and is described in https://github.com/influxdb/influxdb/blob/master/services/graphite/README.md. This would be compatible with the current implementation of sending samples to clients with a dot separated metric name and would allow for fully configurable mapping of metric names to InfluxDB measurement and tags.

Any feedback/ideas on how this should be implemented?

Ok, so I've sort of implemented both approaches mentioned above.

The most naive way (current implementation) is to store all series in the same preconfigured database, using the sample dot separated name as the measurement/table name and without using any tags or fields.

This is the default in both approaches.

Another approach would be to change the format of the data passed from bucky's servers to the clients, so instead of a (host, time, value, time) tuple, for each sample the server would send an arbitrary set of key-value pairs (this could be 'plugin', 'plugin_instance', 'type', 'type_instance' etc for collectd, I don't know if something like that would also apply to metricsd and statsd servers. Clients that support key-value pairs (like InfluxDBClient) would use them as tags, whereas clients that need a single string for the series name would have to serialize them and produce the name.

I implemented something along these lines for CollectDServer and InfluxDBClient in dimrozakis@bf7d5bc.

A different approach would be to use a scheme like the one used by the graphite plugin of InfluxDB and is described in https://github.com/influxdb/influxdb/blob/master/services/graphite/README.md. This would be compatible with the current implementation of sending samples to clients with a dot separated metric name and would allow for fully configurable mapping of metric names to InfluxDB measurement and tags.

This implementation can be found in dimrozakis/bucky@influxdb...dimrozakis:templates .

The templates approach isn't bad but I think that the first implementation that extends bucky's server-client API is the way to go here. @trbs care to give some feedback?

trbs commented

Sorry for the late reply, it's been (and continues to be) busy.

The template approach seems very powerful and flexible, but also slow(er) and complexer to maintain.

I would say both but I do have some hesitation in the amount of code (maintainability) of the template option. Having the naive way as a default is best since I think that's what most people would expect.

Would you like to focus on one implementation ? In that case I would recommend the naive approach. Or maybe implemented two sets classes for InfluxDB which gives the user the option to use either one ?