Add macro similar to $rateColumns for counter like metrics
JiriHorky opened this issue · 15 comments
Similarly to request in #78, please consider adding support for computing $perSecond for multiple series (hosts).
Currently, the $rateColumns computes something like:
SELECT t, arrayMap(a -> (a.1, a.2/runningDifference( t/1000 )), groupArr)
FROM (
SELECT t, groupArray((host, v)) as groupArr
....
Which does not produce correct results for metrics which only grows over time, as one needs to compute difference between two consequitive values v
as well (not just time).
Unfortunatelly, doing runningDifference(a.2)/runningDifference(t/1000)
does not work, as a.2
gets materialized to different values v
between different columns (host
).
I have been scratching my head for a while and I am not sure how to write such query in Clickhouse at all. I hope you know better :)
Hi @JiriHorky
Back to #78
Let's introduce new data:
SELECT *
FROM issue_78
┌───────Date─┬────────────────Time─┬─Packets─┬─Type─┐
│ 2018-08-31 │ 2018-08-31 10:51:00 │ 1000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:51:00 │ 1000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:52:00 │ 2000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:52:00 │ 3000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:53:00 │ 3000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:53:00 │ 5000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:54:00 │ 4000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:54:00 │ 7000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:55:00 │ 5000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:55:00 │ 9000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:56:00 │ 6000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:56:00 │ 11000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:57:00 │ 7000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:57:00 │ 13000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:58:00 │ 8000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:58:00 │ 15000 │ udp │
│ 2018-08-31 │ 2018-08-31 10:59:00 │ 9000 │ tcp │
│ 2018-08-31 │ 2018-08-31 10:59:00 │ 17000 │ udp │
└────────────┴─────────────────────┴─────────┴──────┘
Here we have a Type
column and counter Packets
. It is obvious, that tcp
grows by 1000 per minute (1000/60=16.6 per second) and udp
by 2000 per minute (2000/60=33.33 per second).
I think the macros should implement the following query:
SELECT
t,
arrayMap(a -> (a.1, a.2 / runningDifference(t)), groupArr) AS am
FROM
(
SELECT
t,
groupArray((Type, runningDifference(c))) AS groupArr
FROM
(
SELECT
intDiv(toUInt32(Time), 120) * 120 AS t,
Type,
max(Packets) AS c
FROM issue_78
GROUP BY
t,
Type
ORDER BY
Type ASC,
t ASC
)
GROUP BY t
ORDER BY t ASC
)
┌──────────t─┬─am──────────────────────────────────────────────────────┐
│ 1535712600 │ [('tcp',nan),('udp',-inf)] │
│ 1535712720 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712840 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712960 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713080 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
└────────────┴─────────────────────────────────────────────────────────┘
5 rows in set. Elapsed: 0.009 sec.
Please, note some details:
- Selecting
max
value for period instead of counting, since we're dealing with counter:
max(Packets)
- The reverted sort in inner query:
ORDER BY
Type ASC,
t ASC
- The runningDifference comparison which is possibly because of reverse sorting:
groupArray((Type, runningDifference(c))) AS groupArr
Please, try this construction with your data. If it fits your need I'll add a new macros into plugin.
Thank you!
Hi @hagen1778 ,
Thanks a lot for your reply, it's very useful. I'm from @JiriHorky team and I have been testing your query for a while and I think we have a problem when there are missing rows in ClickHouse table. I will try to explain:
This query I want to create should produce a graph of CPU usage of the particular hosts and we want to be able to see the metric for several hosts. After a small adjustment I made it work with Grafana's variable $host
, where we want to select multiple values (hosts).
I adjusted the query like this:
SELECT
t,
arrayMap(a -> (a.1, 100 - a.2 / runningDifference(t) * 1000), ga) AS am
FROM
(
SELECT
t,
groupArray((host, runningDifference(v))) AS ga
FROM
(
SELECT
$timeSeries AS t,
host,
max(value) AS v
FROM $table
WHERE $timeFilter and host in ($host)
GROUP BY
t,
host
ORDER BY
host ASC,
t ASC
)
GROUP BY t
ORDER BY t ASC
)
Where $table
is set to cpu_idle
. And as for $host
I will select two values S182V06001924
and S182V06001957
for this example .
In ClickHouse we have table cpu_idle
and here is an example of data from the two hosts:
┌─toUInt32(time)─┬────────────────time─┬─host──────────┬────value─┐
│ 1540810800 │ 2018-10-29 12:00:00 │ S182V06001924 │ 65945602 │
│ 1540810800 │ 2018-10-29 12:00:00 │ S182V06001957 │ 19140170 │
│ 1540811100 │ 2018-10-29 12:05:00 │ S182V06001924 │ 65973829 │
│ 1540811100 │ 2018-10-29 12:05:00 │ S182V06001957 │ 19161774 │
│ 1540811400 │ 2018-10-29 12:10:00 │ S182V06001924 │ 66001985 │
│ 1540811700 │ 2018-10-29 12:15:00 │ S182V06001924 │ 66029971 │
│ 1540812000 │ 2018-10-29 12:20:00 │ S182V06001924 │ 66057855 │
│ 1540812300 │ 2018-10-29 12:25:00 │ S182V06001924 │ 66086019 │
│ 1540812300 │ 2018-10-29 12:25:00 │ S182V06001957 │ 19224748 │
└────────────────┴─────────────────────┴───────────────┴──────────┘
You may notice that we did not receive data from host S182V06001957
at 12:10, 12:15 and 12:20.
And here comes the issue:
If I try to apply the query on this data and simulate multiselect by WHERE (host IN ('S182V06001957', 'S182V06001924'))
:
SELECT
toDateTime(t),
t,
arrayMap(a -> (a.1, 100 - (a.2 / runningDifference(t))), groupArr) AS am
FROM
(
SELECT
t,
groupArray((host, runningDifference(v))) AS groupArr
FROM
(
SELECT
toUInt32(time) AS t,
host,
max(value) AS v
FROM cpu_idle
WHERE (host IN ('S182V06001957', 'S182V06001924')) AND (time >= '2018-10-29 12:00:00') AND (time <= '2018-10-29 12:25:00')
GROUP BY
t,
host
ORDER BY
host ASC,
t ASC
)
GROUP BY t
ORDER BY t ASC
)
I will get this:
┌───────toDateTime(t)─┬──────────t─┬─am───────────────────────────────────────────────────────────────────────────┐
│ 2018-10-29 12:00:00 │ 1540810800 │ [('S182V06001924',nan),('S182V06001957',inf)] │
│ 2018-10-29 12:05:00 │ 1540811100 │ [('S182V06001924',5.909999999999997),('S182V06001957',27.986666666666665)] │
│ 2018-10-29 12:10:00 │ 1540811400 │ [('S182V06001924',6.146666666666661)] │
│ 2018-10-29 12:15:00 │ 1540811700 │ [('S182V06001924',6.713333333333338)] │
│ 2018-10-29 12:20:00 │ 1540812000 │ [('S182V06001924',7.0533333333333275)] │
│ 2018-10-29 12:25:00 │ 1540812300 │ [('S182V06001924',6.1200000000000045),('S182V06001957',-109.91333333333333)] │
└─────────────────────┴────────────┴──────────────────────────────────────────────────────────────────────────────┘
But if I select just S182V06001957
I will get this:
┌───────toDateTime(t)─┬──────────t─┬─am─────────────────────────────────────┐
│ 2018-10-29 12:00:00 │ 1540810800 │ [('S182V06001957',nan)] │
│ 2018-10-29 12:05:00 │ 1540811100 │ [('S182V06001957',27.986666666666665)] │
│ 2018-10-29 12:25:00 │ 1540812300 │ [('S182V06001957',47.52166666666667)] │
└─────────────────────┴────────────┴────────────────────────────────────────┘
As you can see in the last row the value we get for S182V06001957
(and thus the graph) are erroneous in case we use multiselect and some data are missing. In other words, the curves in the graph change depending on what host(-s) I select.
Could you please help me with that? Have you got any idea how to get rid of this? I tried to adjust the query for a while but unsuccessfully.
In case you need any further info, please ask.
Thanks a lot.
Hi @MichalNedbalek,
I guess I got the idea. Let's introduce updated data:
┌──────────t─┬─Type─┬─────m─┐
│ 1535712600 │ tcp │ 1000 │
│ 1535712720 │ tcp │ 3000 │
│ 1535712840 │ tcp │ 5000 │
│ 1535712960 │ tcp │ 7000 │
│ 1535713080 │ tcp │ 9000 │
│ 1535713200 │ tcp │ 11000 │
│ 1535713320 │ tcp │ 13000 │
│ 1535713440 │ tcp │ 15000 │
│ 1535713560 │ tcp │ 17000 │
│ 1535712600 │ udp │ 1000 │
│ 1535712720 │ udp │ 5000 │
│ 1535712840 │ udp │ 9000 │
│ 1535712960 │ udp │ 13000 │
│ 1535713080 │ udp │ 17000 │
│ 1535713440 │ udp │ 29000 │
│ 1535713560 │ udp │ 33000 │
└────────────┴──────┴───────┘
We see a little time gap for udp
type between last two rows. To avoid false results I suggest to use following query:
SELECT
t * 1000 AS t,
groupArray((Type, c)) AS groupArr
FROM
(
SELECT
t,
Type,
runningDifference(m) AS diff,
if(diff < 0, nan, diff / runningDifference(t)) AS c
FROM
(
SELECT
intDiv(toUInt32(Time), 120) * 120 AS t,
Type,
max(Packets) AS m
FROM issue_78
GROUP BY
t,
Type
ORDER BY
Type ASC,
t ASC
)
)
GROUP BY t
ORDER BY t ASC
┌─────────────t─┬─groupArr────────────────────────────────────────────────┐
│ 1535712600000 │ [('tcp',nan),('udp',nan)] │
│ 1535712720000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712840000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535712960000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713080000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713200000 │ [('tcp',16.666666666666668)] │
│ 1535713320000 │ [('tcp',16.666666666666668)] │
│ 1535713440000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
│ 1535713560000 │ [('tcp',16.666666666666668),('udp',33.333333333333336)] │
└───────────────┴─────────────────────────────────────────────────────────┘
The following operations:
SELECT
runningDifference(m) AS diff,
if(diff < 0, nan, diff / runningDifference(t)) AS c
gives us opportunity to compare values of the same type considering the time gaps. Also, the condition if(diff < 0
avoids false results of comparing the values for different types.
The result of the query could be easily displayed in Grafana, since plugin knows how to behave with null values. Anyway, you still have a possibility to manage the way of displaying the null values on panel:
Hope that will help!
Hi @hagen1778,
This works perfectly, thank you very much. :) My final version of the query look like this:
SELECT
t,
groupArray((host, c)) AS groupArr
FROM
(
SELECT
t,
host,
runningDifference(v) AS diff,
if(diff < 0, nan, 100 - diff / runningDifference(t) * 1000) AS c
FROM
(
SELECT
$timeSeries AS t,
host,
max(value) AS v
FROM $table
WHERE $timeFilter and host in ($host)
GROUP BY
t,
host
ORDER BY
host ASC,
t ASC
)
)
GROUP BY t
ORDER BY t ASC
Are going to create the macro for this? If so, I would wait for it, because I want to implement similar queries in some 20+ other graphs and the macro would make everything easier and faster.
Thanks a lot.
Yes, I will implement it. New release will published this week
Thanks for info
Hi @MichalNedbalek
Why there is an additional sub operation in if(diff < 0, nan, 100 - diff / runningDifference(t) * 1000) AS c
? I mean the following 100 - diff
Hi @hagen1778
Because in this particular case we get cpu_idle
metric from the hosts. So when I want to calculate percentage of CPU usage I need to subtract the value from 100.
Currently I don't see an easy way to make customizable macros where you will be able to do 100 - diff
. The macros is following:
$perSecondColumns(type, total) FROM requests where type in ('udp', 'tcp')
Which results into:
SELECT
t,
groupArray((Type, max_0_Rate)) AS groupArr
FROM
(
SELECT
t,
type,
if(runningDifference(max_0) < 0, nan, runningDifference(max_0) / runningDifference(t / 1000)) AS max_0_Rate
FROM
(
SELECT
$timeSeries AS t,
type,
max(total) AS max_0
FROM $table
WHERE $timeFilter
AND (Type IN ('udp', 'tcp'))
GROUP BY
t,
type
ORDER BY
type ASC,
t ASC
)
)
GROUP BY t
ORDER BY t ASC
In the inner query we need to get max value for column (so it should be just it's name) and in second query apply runningDifference. I'm wondering how should I support passing extra operations here...
The both $perSecond and $perSecondColumns are in the master right now. Could you test your queries with that update?
I have to wait for a colleague who will update our Grafana and I'm going to test it tomorrow. Thanks for the macros.
Hi @hagen1778 ,
We tried the new version and we ended up with all graphs not working with error query is undefined
and stacktrace like this:
execute/SqlQuery</SqlQuery.unescape@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/sql_query.js:428:25
execute/SqlQuery</SqlQuery.prototype.replace@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/sql_query.js:101:29
execute/ClickHouseDatasource</ClickHouseDatasource.prototype.query/<@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/datasource.js:84:33
h@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:1:53965
ls@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:1:95226
execute/ClickHouseDatasource</ClickHouseDatasource.prototype.query@https://grafana.avast.com/public/plugins/vertamedia-clickhouse-datasource/datasource.js:81:21
1752/g</e.prototype.issueQueries@https://grafana.avast.com/public/build/0.6aa12092d3333cb24394.js:7:18163
1818/p</e.prototype.issueQueries@https://grafana.avast.com/public/build/0.6aa12092d3333cb24394.js:7:223548
u@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:68722
l/<@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:69127
$digest@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:74824
$apply@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:76636
o/c<@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:84010
i@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:21868
mt/l.defer/n<@https://grafana.avast.com/public/build/vendor.6aa12092d3333cb24394.js:87:23332
Today we will try to install it to our testing Grafana and test it there.
As for the 100 - diff
queries, I think it is not necessary to be concerned with it for now. I will use the query as it is now and later we will change the metrics we get from the hosts and the query will get changed anyway.
Thanks
Yep, my fault. Should be fixed now. Could you try it again?
@MichalNedbalek ping
Hi @hagen1778,
Sorry for the delay and thank you for the fix. I finally got to the testing of your new macros. It looks ok from my point of view and it is perfectly usable (except for the special case higher where I'm gonna use the slightly complicated query because of the 100 - diff
).
So what are the next steps now?
Thanks