scylladb/scylla-monitoring

I/O queue consumption graph is totally confusing

vladzcloudius opened this issue · 11 comments

Monitoring version: 4.6+
Dashboard: Advanced
Graph: "I/O queue all groups consumption"

Description

This graph was added in the context of #2088.
And there are a few issues with the current graph:

  1. It was explicitly stated by @xemul that this graph is meaningless on a per-shard level. However the current PromQL allows seeing it on a per-shard level:
sum(rate(scylla_io_queue_consumption{iogroup=~"$iogroup", class=~"$classes", instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc", shard=~"[[shard]]"}[1m])) by ([[by]], class, iogroup)

Instead it should have been:

sum(rate(scylla_io_queue_consumption{iogroup=~"$iogroup", class=~"$classes", instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc"}[1m])) by (instance, class, iogroup)
  1. From the same definition from #2088 the user of this graph should watch out for a sum of these values for each io class for each node to not go above 1.0 (of 100%). However "stacking" stacks up all graphs for all nodes making this graph completely impossible to use unless you manually filter out each node:

image

And here is how the graph looks when we filter a single node only:
image

And if we remove stacking and select all nodes:
image

Now we clearly see a problematic behavior while stacking was muffling it.

Instead of using stacking (which obviously makes more bad than good here) we should sum by the io class for each node:

image

Plus we want to add a corresponding graph for each I/O class section to see which class is causing trouble.
Here is the expression for a "compaction" class (which was causing us a problem in an example above):
image

@vladzcloudius have you seen #2293?

@vladzcloudius have you seen #2293?

@vladzcloudius have you seen #2293?

No, I didn't but these two issues seem to complete each other.

No, I didn't but these two issues seem to complete each other.

Great, so I'll see how to get the best of the two for the new release

@vladzcloudius It will take a few back-and-forths, but we'll find the best solution at the end.

Thinking about it, the only interesting part is how scylla_io_queue_consumption is distributed over an entire node.
I can replace the single stacked panel with a panel per node, which will be a stacked graph.
This means that in a 30-node cluster, you'll get 30 panels unless you filter them.

If we want one graph for all nodes, it should sum over everything per instance, making it easier to see which nodes are around 100%. I'm not sure we need it.

For the io queue section, scylla_io_queue_consumption 2024.1 and higher has both a class tag and an iogroup tag.
Would you like to aggregate out the iogroup?
Do you want to see a node comparison (like in your example) where, for each (filtered) node, you'll get one line of the sum over that node?
How would you like (if any) the user's selection of shard, node, DC, or cluster view to be reflected in the graph?

@vladzcloudius It will take a few back-and-forths, but we'll find the best solution at the end.

Thinking about it, the only interesting part is how scylla_io_queue_consumption is distributed over an entire node. I can replace the single stacked panel with a panel per node, which will be a stacked graph. This means that in a 30-node cluster, you'll get 30 panels unless you filter them.

If we want one graph for all nodes, it should sum over everything per instance, making it easier to see which nodes are around 100%. I'm not sure we need it.

I'm sure we need it. ;)
But summing is not by instance: it's by instance and by iogroup:

sum(rate(scylla_io_queue_consumption{class=~"$classes", instance=~"[[node]]",cluster="$cluster", dc=~"$dc"}[1m])) by (iogroup, instance)

For the io queue section, scylla_io_queue_consumption 2024.1 and higher has both a class tag and an iogroup tag. Would you like to aggregate out the iogroup? Do you want to see a node comparison (like in your example) where, for each (filtered) node, you'll get one line of the sum over that node? How would you like (if any) the user's selection of shard, node, DC, or cluster view to be reflected in the graph?

Exactly as in my example - these graphs should always sum by (instance, iogroup) and never filter by shard. Note that "shard=~[[shard]]" is removed from the metric filter list in my examples.

I'll create some examples and add them. If we are doing instance and iogroup, you get a branch of graphs per instance. Maybe that's what you're after. Think about a 16—or even 32-cluster nodes panel. Would that be useful?

This is how a stack graph iogroup, class per node (using repeated panels looks like)
image

This is how 9 instance cluster, two io groups looks like on a single graph:
image

Is it helpful this way?

Stack is not going to be useful in any way unless you are looking at a single node. - So, let's drop stack completely.

We always want to see all instances on a single graph.

Also, please, note that there is another label you need to sum over: stream.
So the final expression is going to be as follows:

sum(rate(scylla_io_queue_consumption{class=~"$classes", instance=~"[[node]]",cluster="$cluster", dc=~"$dc"}[1m])) by (iogroup, instance, stream)

@vladzcloudius you'll get it anyway it will be helpful:
First, don't you think there will be too many lines on the same graph?
Second, don't you think that a repeated panel per instance will be helpful?
Third, in your example, class is part of the filter but not part of the by labels, which means that when someone changes the filter choices, the graph will have the same number of lines but with different values and without a good indication of why.

@vladzcloudius you'll get it anyway it will be helpful: First, don't you think there will be too many lines on the same graph?

There will be one line for <each node>*<each stream (by default 1)>*<for each io_group, by default 1 per NUMA>.
So, it's going to be fine.

Second, don't you think that a repeated panel per instance will be helpful?

No, I don't think so. We are always looking for outliers. We can always filter specific nodes and specific io classes (e.g. "compaction") if needed.

Third, in your example, class is part of the filter but not part of the by labels, which means that when someone changes the filter choices, the graph will have the same number of lines but with different values and without a good indication of why.

Yup. This is intentional. This allows you to find outlier nodes for a specific io class.