Visualizations in grafana end up hitting max buckets but the same visualization in kibana works fine
Opened this issue · 16 comments
What happened:
We've been working through the max bucket error in grafana with an opensearch datasource. Initially I thought the issue was with opensearch however we have bumped that max bucket limit to 65536 and we are still mostly seeing this error (some aggregations now work but most hit this limit and error). To compare I recreated the same simple visualization in kibana (or whatever the equivalent is called for opensearch) and I don't get any errors and it generates the visualization quickly. I suspect that the opensearch plugin is doing something differently than kibana that is causing it to hit this limit even with a high setting for the limit.
What you expected to happen:
visualizations to work without hitting max buckets
How to reproduce it (as minimally and precisely as possible):
Create a simple aggregation in grafana using an opensearch datasource
Anything else we need to know?:
Environment:
- Grafana version: Grafana v11.2.0-73451
- OpenSearch version: AWS 2.13
- Plugin version: 2.17.1
Hi @rkarthikr, I tried running the query as shown in your screenshot but wasn't able to reproduce an error. Can you show what is in the query object by clicking on the Query Inspector
and clicking on the Query
tab in the inspector. The query will be listed in data.queries
.
{
"traceId": "50384ed94095e8fe6eedfee4c020957a",
"request": {
"url": "api/ds/query?ds_type=grafana-opensearch-datasource&requestId=explore_o6v",
"method": "POST",
"data": {
"queries": [
{
"refId": "A",
"datasource": {
"type": "grafana-opensearch-datasource",
"uid": "ads67lnsevj0gd"
},
"query": "*",
"queryType": "lucene",
"alias": "",
"metrics": [
{
"type": "count",
"id": "1"
}
],
"bucketAggs": [
{
"type": "date_histogram",
"id": "2",
"settings": {
"interval": "auto"
},
"field": "startTime"
}
],
"format": "table",
"timeField": "startTime",
"luceneQueryType": "Traces",
"datasourceId": 12,
"intervalMs": 60000,
"maxDataPoints": 1515
}
],
"from": "1722177213354",
"to": "1722180813354"
},
"hideFromInspector": false
},
"response": {
"message": "An error occurred within the plugin",
"messageId": "plugin.downstreamError",
"statusCode": 500,
"traceID": "50384ed94095e8fe6eedfee4c020957a"
}
}
@kevinwcyu - Any updates on this ?
Hi @rkarthikr ! I've been investigating this. I haven't been able to reproduce it, but I have found some differences between the query that the opensearch dashboard runs and the one we create, and we'll continue to investigate why those differences exist and whether they affect performance.
I will reach out to you in Grafana Community Slack.
Hi @rkarthikr! You mention that you're getting max_buckets for this query, but I only see the plugin.downstreamError error. How did you discover this is a max buckets error and not an error in the plugin code? Thanks!
Saw the error in the OpenSearch Logs. I tried increasing the max buckets config on OpenSearch end and i no longer get this error. But still get the plugin.downstreamError error with no additional details on error
Please let me know . happy to walk you through the demo env to see if you can use it to collect data for troubleshooting further
Hi @rkarthikr,
it would be super helpful to get a step by step on how to set up a similar environment, since it seems like our backend might be running into errors with the data itself. Thanks a lot!
- Demo Application - https://github.com/open-telemetry/opentelemetry-demo/tree/main/kubernetes. Deployed the application listed here in to an EKS Cluster
- Updated OTEL Config to send traces to OpenSearch
- Setup OpenSearch Datasource in Grafana
- Using Data Source Explorer - Explore Trace data for > 5 min and see error
Please let me know if there is any way to enable Grafana logs that will help you to troubleshoot this further . I am using Grafana Cloud demo environment for this
Did see the same error while trying to explore data for my new project (local docker setup). (opensearchproject/opensearch:2 & grafana/grafana:11.1.4)
Only a few 100 messages produced the max buckets
error
Hi @rkarthikr, Could you share the visualization from the OpenSearch Dashboard (Kibana) that works? With the demo application, I still haven't been able to get an error related to the max bucket limit, but do get the same error shown in the screenshot in the description when I perform a trace query.
I think the plugin.downstreamError
error might potentially be fixed by #445, while we still have to try to figure out what is causing the max bucket error.
Could it be the interval setting? I'm getting the same error sometimes (also with AWS OpenSearch) when setting the interval to auto but when I set it manually to a bigger number it works fine.
I can also see visually that the interval behavior is a bit different between Grafana and Kibana.
Could it be the interval setting? I'm getting the same error sometimes (also with AWS OpenSearch) when setting the interval to auto but when I set it manually to a bigger number it works fine. I can also see visually that the interval behavior is a bit different between Grafana and Kibana.
Hi @yotamN, There isn't an option to set the interval for Traces
queries so I just wanted to clarify whether you are running a Traces
query as shown in the issue description or aMetric
query?
Could it be the interval setting? I'm getting the same error sometimes (also with AWS OpenSearch) when setting the interval to auto but when I set it manually to a bigger number it works fine. I can also see visually that the interval behavior is a bit different between Grafana and Kibana.
Hi @yotamN, There isn't an option to set the interval for
Traces
queries so I just wanted to clarify whether you are running aTraces
query as shown in the issue description or aMetric
query?
On a second look I think I was wrong a bit in my error description, please tell me if it's relevant since I still get the same error in OpenSearch logs.
I set the interval to a constant number (since there isn't a way to set a minimum interval instead) and when I set a big range I get this error since there are too many buckets.
Hi @yotamN, we've seen the max bucket error for Metric
queries in the past and we usually recommend adjusting the search.max_buckets
setting in OpenSearch, but adjusting the interval is another way of tweaking the query to avoid hitting the error as well.
Since you mentioned you were setting the interval I just wanted to clarify if you were running a Metric
query or a Traces
query (like the one shown in the original issue description) because we haven't been able to reproduce the max bucket error for Traces
queries yet. If it was a Traces
query it would be good to get an example query to help us reproduce it.