Altinity/clickhouse-grafana

Add macro similar to $rateColumns for counter like metrics

JiriHorky opened this issue · 15 comments

Similarly to request in #78, please consider adding support for computing $perSecond for multiple series (hosts).

Currently, the $rateColumns computes something like:

SELECT t, arrayMap(a -> (a.1, a.2/runningDifference( t/1000 )), groupArr)
FROM (
    SELECT t, groupArray((host, v)) as groupArr 
....

Which does not produce correct results for metrics which only grows over time, as one needs to compute difference between two consequitive values v as well (not just time).

Unfortunatelly, doing runningDifference(a.2)/runningDifference(t/1000) does not work, as a.2 gets materialized to different values v between different columns (host).

I have been scratching my head for a while and I am not sure how to write such query in Clickhouse at all. I hope you know better :)

Hi @JiriHorky
Back to #78
Let's introduce new data:

SELECT *
FROM issue_78

┌───────Date─┬────────────────Time─┬─Packets─┬─Type─┐
│ 2018-08-31 │ 2018-08-31 10:51:00 │    1000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:51:00 │    1000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:52:00 │    2000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:52:00 │    3000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:53:00 │    3000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:53:00 │    5000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:54:00 │    4000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:54:00 │    7000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:55:00 │    5000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:55:00 │    9000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:56:00 │    6000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:56:00 │   11000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:57:00 │    7000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:57:00 │   13000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:58:00 │    8000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:58:00 │   15000 │ udp  │
│ 2018-08-31 │ 2018-08-31 10:59:00 │    9000 │ tcp  │
│ 2018-08-31 │ 2018-08-31 10:59:00 │   17000 │ udp  │
└────────────┴─────────────────────┴─────────┴──────┘

Here we have a Type column and counter Packets. It is obvious, that tcp grows by 1000 per minute (1000/60=16.6 per second) and udp by 2000 per minute (2000/60=33.33 per second).
I think the macros should implement the following query:

SELECT
    t,
    arrayMap(a -> (a.1, a.2 / runningDifference(t)), groupArr) AS am
FROM
(
    SELECT
        t,
        groupArray((Type, runningDifference(c))) AS groupArr
    FROM
    (
        SELECT
            intDiv(toUInt32(Time), 120) * 120 AS t,
            Type,
            max(Packets) AS c
        FROM issue_78
        GROUP BY
            t,
            Type
        ORDER BY
            Type ASC,
            t ASC
    )
    GROUP BY t
    ORDER BY t ASC
)

┌──────────t─┬─am──────────────────────────────────────────────────────┐
│ 1535712600 │ [('tcp',nan),('udp',-inf)]                              │
│ 1535712720 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712840 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712960 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713080 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
└────────────┴─────────────────────────────────────────────────────────┘

5 rows in set. Elapsed: 0.009 sec.

Please, note some details:

  • Selecting max value for period instead of counting, since we're dealing with counter:
max(Packets)
  • The reverted sort in inner query:
 ORDER BY
            Type ASC,
            t ASC
  • The runningDifference comparison which is possibly because of reverse sorting:
 groupArray((Type, runningDifference(c))) AS groupArr

Please, try this construction with your data. If it fits your need I'll add a new macros into plugin.
Thank you!

Hi @hagen1778 ,
Thanks a lot for your reply, it's very useful. I'm from @JiriHorky team and I have been testing your query for a while and I think we have a problem when there are missing rows in ClickHouse table. I will try to explain:

This query I want to create should produce a graph of CPU usage of the particular hosts and we want to be able to see the metric for several hosts. After a small adjustment I made it work with Grafana's variable $host, where we want to select multiple values (hosts).

I adjusted the query like this:

SELECT
   t,
   arrayMap(a -> (a.1, 100 - a.2 / runningDifference(t) * 1000), ga) AS am
FROM
(
   SELECT
       t,
       groupArray((host, runningDifference(v))) AS ga
   FROM
   (
       SELECT
           $timeSeries AS t,
           host,
           max(value) AS v
       FROM $table
       WHERE $timeFilter and host in ($host)
       GROUP BY
           t,
           host
       ORDER BY
           host ASC,
           t ASC
   )
   GROUP BY t
   ORDER BY t ASC
)

Where $table is set to cpu_idle. And as for $host I will select two values S182V06001924 and S182V06001957 for this example .

In ClickHouse we have table cpu_idle and here is an example of data from the two hosts:

┌─toUInt32(time)─┬────────────────time─┬─host──────────┬────value─┐
│     1540810800 │ 2018-10-29 12:00:00 │ S182V06001924 │ 65945602 │
│     1540810800 │ 2018-10-29 12:00:00 │ S182V06001957 │ 19140170 │
│     1540811100 │ 2018-10-29 12:05:00 │ S182V06001924 │ 65973829 │
│     1540811100 │ 2018-10-29 12:05:00 │ S182V06001957 │ 19161774 │
│     1540811400 │ 2018-10-29 12:10:00 │ S182V06001924 │ 66001985 │
│     1540811700 │ 2018-10-29 12:15:00 │ S182V06001924 │ 66029971 │
│     1540812000 │ 2018-10-29 12:20:00 │ S182V06001924 │ 66057855 │
│     1540812300 │ 2018-10-29 12:25:00 │ S182V06001924 │ 66086019 │
│     1540812300 │ 2018-10-29 12:25:00 │ S182V06001957 │ 19224748 │
└────────────────┴─────────────────────┴───────────────┴──────────┘

You may notice that we did not receive data from host S182V06001957 at 12:10, 12:15 and 12:20.

And here comes the issue:
If I try to apply the query on this data and simulate multiselect by WHERE (host IN ('S182V06001957', 'S182V06001924')):

SELECT
    toDateTime(t),
    t,
    arrayMap(a -> (a.1, 100 - (a.2 / runningDifference(t))), groupArr) AS am
FROM
(
    SELECT
        t,
        groupArray((host, runningDifference(v))) AS groupArr
    FROM
    (
        SELECT
            toUInt32(time) AS t,
            host,
            max(value) AS v
        FROM cpu_idle
        WHERE (host IN ('S182V06001957', 'S182V06001924')) AND (time >= '2018-10-29 12:00:00') AND (time <= '2018-10-29 12:25:00')
        GROUP BY
            t,
            host
        ORDER BY
            host ASC,
            t ASC
    )
    GROUP BY t
    ORDER BY t ASC
)

I will get this:

┌───────toDateTime(t)─┬──────────t─┬─am───────────────────────────────────────────────────────────────────────────┐
│ 2018-10-29 12:00:00 │ 1540810800 │ [('S182V06001924',nan),('S182V06001957',inf)]                                │
│ 2018-10-29 12:05:00 │ 1540811100 │ [('S182V06001924',5.909999999999997),('S182V06001957',27.986666666666665)]   │
│ 2018-10-29 12:10:00 │ 1540811400 │ [('S182V06001924',6.146666666666661)]                                        │
│ 2018-10-29 12:15:00 │ 1540811700 │ [('S182V06001924',6.713333333333338)]                                        │
│ 2018-10-29 12:20:00 │ 1540812000 │ [('S182V06001924',7.0533333333333275)]                                       │
│ 2018-10-29 12:25:00 │ 1540812300 │ [('S182V06001924',6.1200000000000045),('S182V06001957',-109.91333333333333)] │
└─────────────────────┴────────────┴──────────────────────────────────────────────────────────────────────────────┘

But if I select just S182V06001957 I will get this:

┌───────toDateTime(t)─┬──────────t─┬─am─────────────────────────────────────┐
│ 2018-10-29 12:00:00 │ 1540810800 │ [('S182V06001957',nan)]                │
│ 2018-10-29 12:05:00 │ 1540811100 │ [('S182V06001957',27.986666666666665)] │
│ 2018-10-29 12:25:00 │ 1540812300 │ [('S182V06001957',47.52166666666667)]  │
└─────────────────────┴────────────┴────────────────────────────────────────┘

As you can see in the last row the value we get for S182V06001957 (and thus the graph) are erroneous in case we use multiselect and some data are missing. In other words, the curves in the graph change depending on what host(-s) I select.

Could you please help me with that? Have you got any idea how to get rid of this? I tried to adjust the query for a while but unsuccessfully.

In case you need any further info, please ask.
Thanks a lot.

Hi @MichalNedbalek,

I guess I got the idea. Let's introduce updated data:

┌──────────t─┬─Type─┬─────m─┐
│ 1535712600 │ tcp  │  1000 │
│ 1535712720 │ tcp  │  3000 │
│ 1535712840 │ tcp  │  5000 │
│ 1535712960 │ tcp  │  7000 │
│ 1535713080 │ tcp  │  9000 │
│ 1535713200 │ tcp  │ 11000 │
│ 1535713320 │ tcp  │ 13000 │
│ 1535713440 │ tcp  │ 15000 │
│ 1535713560 │ tcp  │ 17000 │
│ 1535712600 │ udp  │  1000 │
│ 1535712720 │ udp  │  5000 │
│ 1535712840 │ udp  │  9000 │
│ 1535712960 │ udp  │ 13000 │
│ 1535713080 │ udp  │ 17000 │
│ 1535713440 │ udp  │ 29000 │
│ 1535713560 │ udp  │ 33000 │
└────────────┴──────┴───────┘

We see a little time gap for udp type between last two rows. To avoid false results I suggest to use following query:

SELECT
    t * 1000 AS t,
    groupArray((Type, c)) AS groupArr
FROM
(
    SELECT
        t,
        Type,
        runningDifference(m) AS diff,
        if(diff < 0, nan, diff / runningDifference(t)) AS c
    FROM
    (
        SELECT
            intDiv(toUInt32(Time), 120) * 120 AS t,
            Type,
            max(Packets) AS m
        FROM issue_78
        GROUP BY
            t,
            Type
        ORDER BY
            Type ASC,
            t ASC
    )
)
GROUP BY t
ORDER BY t ASC

┌─────────────t─┬─groupArr────────────────────────────────────────────────┐
│ 1535712600000 │ [('tcp',nan),('udp',nan)]                               │
│ 1535712720000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712840000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712960000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713080000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713200000 │ [('tcp',16.666666666666668)]                            │
│ 1535713320000 │ [('tcp',16.666666666666668)]                            │
│ 1535713440000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713560000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
└───────────────┴─────────────────────────────────────────────────────────┘

The following operations:

SELECT
        runningDifference(m) AS diff,
        if(diff < 0, nan, diff / runningDifference(t)) AS c

gives us opportunity to compare values of the same type considering the time gaps. Also, the condition if(diff < 0 avoids false results of comparing the values for different types.

The result of the query could be easily displayed in Grafana, since plugin knows how to behave with null values. Anyway, you still have a possibility to manage the way of displaying the null values on panel:
download

Hope that will help!

Hi @hagen1778,

This works perfectly, thank you very much. :) My final version of the query look like this:

SELECT
    t,
    groupArray((host, c)) AS groupArr
FROM
(
    SELECT
        t,
        host,
        runningDifference(v) AS diff,
        if(diff < 0, nan, 100 - diff / runningDifference(t) * 1000) AS c
    FROM
    (
        SELECT
            $timeSeries AS t,
            host,
            max(value) AS v
        FROM $table
        WHERE $timeFilter and host in ($host)
        GROUP BY
            t,
            host
        ORDER BY
            host ASC,
            t ASC
    )
)
GROUP BY t
ORDER BY t ASC

Are going to create the macro for this? If so, I would wait for it, because I want to implement similar queries in some 20+ other graphs and the macro would make everything easier and faster.

Thanks a lot.

Yes, I will implement it. New release will published this week

Thanks for info

Hi @MichalNedbalek
Why there is an additional sub operation in if(diff < 0, nan, 100 - diff / runningDifference(t) * 1000) AS c? I mean the following 100 - diff

Hi @hagen1778
Because in this particular case we get cpu_idle metric from the hosts. So when I want to calculate percentage of CPU usage I need to subtract the value from 100.

Currently I don't see an easy way to make customizable macros where you will be able to do 100 - diff. The macros is following:
$perSecondColumns(type, total) FROM requests where type in ('udp', 'tcp')
Which results into:

SELECT
    t,
    groupArray((Type, max_0_Rate)) AS groupArr
FROM
(
    SELECT
        t,
        type,
        if(runningDifference(max_0) < 0, nan, runningDifference(max_0) / runningDifference(t / 1000)) AS max_0_Rate
    FROM
    (
        SELECT
            $timeSeries AS t,
            type,
            max(total) AS max_0
        FROM $table
        WHERE $timeFilter 
        AND (Type IN ('udp', 'tcp'))
        GROUP BY
            t,
            type
        ORDER BY
            type ASC,
            t ASC
    )
)
GROUP BY t
ORDER BY t ASC

In the inner query we need to get max value for column (so it should be just it's name) and in second query apply runningDifference. I'm wondering how should I support passing extra operations here...

The both $perSecond and $perSecondColumns are in the master right now. Could you test your queries with that update?

I have to wait for a colleague who will update our Grafana and I'm going to test it tomorrow. Thanks for the macros.

Hi @hagen1778 ,
We tried the new version and we ended up with all graphs not working with error query is undefined and stacktrace like this:

execute/SqlQuery</SqlQuery.unescape@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/sql_query.js:428:25
execute/SqlQuery</SqlQuery.prototype.replace@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/sql_query.js:101:29
execute/ClickHouseDatasource</ClickHouseDatasource.prototype.query/<@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/datasource.js:84:33
h@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:1:53965
ls@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:1:95226
execute/ClickHouseDatasource</ClickHouseDatasource.prototype.query@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/datasource.js:81:21
1752/g</e.prototype.issueQueries@https://grafana.avast.com/public/build/0.6aa12092d3333cb24394.js:7:18163
1818/p</e.prototype.issueQueries@https://grafana.avast.com/public/build/0.6aa12092d3333cb24394.js:7:223548
u@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:68722
l/<@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:69127
$digest@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:74824
$apply@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:76636
o/c<@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:84010
i@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:21868
mt/l.defer/n<@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:23332

Today we will try to install it to our testing Grafana and test it there.

As for the 100 - diff queries, I think it is not necessary to be concerned with it for now. I will use the query as it is now and later we will change the metrics we get from the hosts and the query will get changed anyway.

Thanks

Yep, my fault. Should be fixed now. Could you try it again?

Hi @hagen1778,

Sorry for the delay and thank you for the fix. I finally got to the testing of your new macros. It looks ok from my point of view and it is perfectly usable (except for the special case higher where I'm gonna use the slightly complicated query because of the 100 - diff).

So what are the next steps now?

Thanks