/grafana-cassandra-source

Apache Cassandra & DataStax Enterprise Datasource for Grafana

Primary LanguageGoMIT LicenseMIT

Cassandra DataSource for Grafana

Apache Cassandra Datasource for Grafana. This datasource is to visualise time-series data stored in Cassandra/DSE, if you are looking for Cassandra metrics, you may need datastax/metric-collector-for-apache-cassandra instead.

Release Status CodeQL

To see the datasource in action, please follow the Quick Demo steps. Documentation is available here

Supports:

  • Grafana
    • 7.x, 8.x, 9.x, 10.x are fully supported (plugin version 2.x)
    • 5.x, 6.x are deprecated (works with plugin versions 1.x, but we recommend upgrading)
  • Cassandra 3.x, 4.x
  • DataStax Enterprise 6.x
  • DataStax Astra (docs)
  • AWS Keyspaces (limited support) (docs)
  • Linux, OSX (incl. M1), Windows

Contacts:

  • Discord Chat
  • Github discussions

Usage

You can find more detailed instructions in the datasource wiki.

Installation

  1. Install the plugin using grafana console tool: grafana-cli plugins install hadesarchitect-cassandra-datasource. The plugin will be installed into your grafana plugins directory; the default is /var/lib/grafana/plugins. Alternatively, download the plugin using latest release, please download cassandra-datasource-VERSION.zip and uncompress a file into the Grafana plugins directory (grafana/plugins).
  2. Add the Apache Cassandra Data Source as a data source at the datasource configuration page.
  3. Configure the datasource specifying contact point and port like "10.11.12.13:9042", username and password. It's strongly recommended to use a dedicated user with read-only permissions only to the table you have to access.
  4. Push the "Save and Test" button, if there is an error message, check the credentials and connection.

Datasource Configuration

Panel Setup

There are two ways to query data from Cassandra/DSE, Query Configurator and Query Editor. Configurator is easier to use but has limited capabilities, Editor is more powerful but requires understanding of CQL.

Query Configurator

Query Configurator

Query Configurator is the easiest way to query data. At first, enter the keyspace and table name, then pick proper columns. If keyspace and table names are given correctly, the datasource will suggest the column names automatically.

  • Time Column - the column storing the timestamp value, it's used to answer "when" question.
  • Value Column - the column storing the value you'd like to show. It can be the value, temperature or whatever property you need.
  • ID Column - the column to uniquely identify the source of the data, e.g. sensor_id, shop_id or whatever allows you to identify the origin of data.

After that, you have to specify the ID Value, the particular ID of the data origin you want to show. You may need to enable "ALLOW FILTERING" although we recommend to avoid it.

Example Imagine you want to visualise reports of a temperature sensor installed in your smart home. Given the sensor reports its ID, time, location and temperature every minute, we create a table to store the data and put some values there:

CREATE TABLE IF NOT EXISTS temperature (
    sensor_id uuid,
    registered_at timestamp,
    temperature int,
    location text,
    PRIMARY KEY ((sensor_id), registered_at)
);

insert into temperature (sensor_id, registered_at, temperature, location) values (99051fe9-6a9c-46c2-b949-38ef78858dd0, 2020-04-01T11:21:59.001+0000, 18, "kitchen");
insert into temperature (sensor_id, registered_at, temperature, location) values (99051fe9-6a9c-46c2-b949-38ef78858dd0, 2020-04-01T11:22:59.001+0000, 19, "kitchen");
insert into temperature (sensor_id, registered_at, temperature, location) values (99051fe9-6a9c-46c2-b949-38ef78858dd0, 2020-04-01T11:23:59.001+0000, 20, "kitchen");

In this case, we have to fill the configurator fields the following way to get the results:

  • Keyspace - smarthome (keyspace name)
  • Table - temperature (table name)
  • Time Column - registered_at (occurence)
  • Value Column - temperature (value to show)
  • ID Column - sensor_id (ID of the data origin)
  • ID Value - 99051fe9-6a9c-46c2-b949-38ef78858dd0 ID of the sensor
  • ALLOW FILTERING - FALSE (not required, so we are happy to avoid)

In case of a few origins (multiple sensors) you will need to add more rows. If your case is as simple as that, query configurator will be a good choice, otherwise please proceed to the query editor.

Query Editor

Query Editor is more powerful way to query data. To enable query editor, press "toggle text edit mode" button.

102781863-a8bd4b80-4398-11eb-8c28-4d06a1f29279

Query Editor unlocks all possibilities of CQL including Used-Defined Functions, aggregations etc.

Example (using the sample table from the Query Configurator case):

SELECT sensor_id, temperature, registered_at, location FROM test.test WHERE sensor_id IN (99051fe9-6a9c-46c2-b949-38ef78858dd1, 99051fe9-6a9c-46c2-b949-38ef78858dd0) AND registered_at > $__timeFrom and registered_at < $__timeTo
  1. Order of fields in the SELECT expression doesn't matter except ID field. This field used to distinguish different time series, so it is important to keep it on the first position.
  • Identifier - the first property in the SELECT expression must be the ID, something that uniquely identifies the data (e.g. sensor_id)
  • Value - There should be at least one numeric value among returned fields, if query result will be used to draw graph.
  • Timestamp - There should be one timestamp value, if query result will be used to draw graph.
  • There could be any number of additional fields, however be cautious when using multiple numeric fields as they are interpreted as values by grafana and therefore are drawn on TimeSeries graph.
  • Any field returned by query is available to use in Alias template, e.g. {{ location }}. Datasource interpolates such strings and updates graph legend.
  • Datasource will try to keep all the fields, however it is not always possible since cassandra and grafana use different sets of supported types. Unsupported fields will be removed from response.
  1. To filter data by time, use $__timeFrom and $__timeTo placeholders as in the example. The datasource will replace them with time values from the panel. Notice It's important to add the placeholders otherwise query will try to fetch data for the whole period of time. Don't try to specify the timeframe on your own, just put the placeholders. It's grafana's job to specify time limits.

103153625-1fd85280-4792-11eb-9c00-085297802117

Table Mode

In addition to TimeSeries mode datasource supports Table mode to draw tables using Cassandra query results. Use Merge, Sort by, Organize fields and other transformations to shape the table in any desirable way. There are two ways to plot not a whole timeseries but only last(most rescent) values.

  1. Inefficient way

In case if table created with default ascending ordering the most recent value is always stored in the end of partition. To retrieve it ORDER BY and LIMIT clauses must be used in query:

SELECT sensor_id, temperature, registered_at, location
FROM test.test
WHERE sensor_id = 99051fe9-6a9c-46c2-b949-38ef78858dd0
AND registered_at > $__timeFrom and registered_at < $__timeTo
ORDER BY registered_at
LIMIT 1

Note that WHERE IN () clause could not be used with ORDER BY, so query must be duplicated for additional sensor_id.

  1. Efficient way

To query the most recent values efficiently ordering must be specified during the table creation:

CREATE TABLE IF NOT EXISTS temperature (
    sensor_id uuid,
    registered_at timestamp,
    temperature int,
    location text,
    PRIMARY KEY ((sensor_id), registered_at)
) WITH CLUSTERING ORDER BY (registered_at DESC);

After that the most recent value will always be stored in the beginning of partition and could be queried with just LIMIT clause:

SELECT sensor_id, temperature, registered_at, room_name
FROM test.test
WHERE sensor_id IN (99051fe9-6a9c-46c2-b949-38ef78858dd0, 99051fe9-6a9c-46c2-b949-38ef78858dd0)
AND registered_at > $__timeFrom and registered_at < $__timeTo
ORDER BY registered_at
PER PARTITION LIMIT 1

Note that PER PARTITION LIMIT 1 used instead of LIMIT 1 to query one row for each partition and not just one row total.

Unix epoch time format

Usually there are no problems - Cassandra can store timestamps using different formats as shown in documentation. However, it is not always enough. One of possible cases could be unix time, which is just number of seconds or milliseconds and usually stored as integer type.

  1. If time is stored as a number of milliseconds in a bigint column, then it should be converted into the timestamp type before return the data to grafana:
SELECT sensor_id, temperature, dateOf(maxTimeuuid(registered_at)), location
FROM test.test WHERE sensor_id = 99051fe9-6a9c-46c2-b949-38ef78858dd0
AND registered_at > $__timeFrom and registered_at < $__timeTo

This query returns proper timestamp even if it stored as number of milliseconds.

  1. If time is stored as a number of seconds, then it is not possible to convert it into the timestamp natively, but there is a trick:
SELECT sensor_id, temperature, dateOf(maxTimeuuid(registered_at*1000)), location
FROM test.test WHERE sensor_id = 99051fe9-6a9c-46c2-b949-38ef78858dd0
AND registered_at > $__unixEpochFrom and registered_at < $__unixEpochTo
  • There are two important parts in this query:
    • dateOf(maxTimeuuid(registered_at*1000)) used to convert seconds to milliseconds(registered_at*1000) and then to convert milliseconds to timestamp type, which is handed over to grafana.
    • $__unixEpochFrom and $__unixEpochTo are variables with unix time in the seconds format that are used to fill out conditions part of the query.

Development

Developer documentation