OnCall can't reach Grafana
sirvincent opened this issue · 6 comments
What went wrong?
We run the OSS of Grafana on Ubuntu 22.04 installed via sudo apt install grafana
(Version 11.2.2+security-01).
We follow the OnCall installation guide for hobby docker compose environment (without grafana).
What happened:
- Step (5)'s last provisioning step:
curl -X POST 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/resources/plugin/install'
returns:
{"message":"An error occurred within the plugin","messageId":"plugin.downstreamError","statusCode":500,"traceID":""}
The troubleshooting curl commands in the README return the same error.
The grafana OnCall plugin screen shows connection from grafana to oncall (v1.11.3, OpenSource):
Side note: The version here is v1.11.3 but in top right the plugin version is v.1.6.2, why? We do not see an update button as the README suggest.
But the corresponding docker compose log from engineshows a 404 error that it can't access a location. The interesting snippet from the log (I have obfuscated IP with ):
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.023862333997385576 status=200 method=HEAD url=http://<IP>:3000/api/org slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.013972447981359437 status=404 method=HEAD url=http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.013520489999791607 status=200 method=HEAD url=http://<IP>:3000/api/org slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.006794679997256026 status=404 method=HEAD url=http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/plugins/grafana-incident-app/settings
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.006319462001556531 status=404 method=GET url=http://<IP>:3000/api/plugins/grafana-incident-app/settings slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/plugins/grafana-labels-app/settings
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.012679944018600509 status=404 method=GET url=http://<IP>:3000/api/plugins/grafana-labels-app/settings slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/plugins/grafana-irm-app/settings
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.012594404979608953 status=404 method=GET url=http://<IP>:3000/api/plugins/grafana-irm-app/settings slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.user_management.sync RBAC status org=1 rbac_enabled=False
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.006195642024977133 status=200 method=HEAD url=http://<IP>:3000/api/org slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.023239836998982355 status=200 method=GET url=http://<IP>:3000/api/org/users slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.017058505007298663 status=200 method=GET url=http://<IP>:3000/api/teams/search?perpage=1000000 slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.010913970996625721 status=200 method=GET url=http://<IP>:3000/api/teams/1/members slow=0
engine_1 | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.014364856004249305 status=200 method=GET url=http://<IP>:3000/api/teams/2/members slow=0
What did you expect to happen:
- No error, working OnCall backend communication to Grafana
We have turned on externalServiceAccounts
in the grafana.ini as follows:
enable = externalServiceAccounts
The issue seems similar to 1035 but the replies there didn't help us.
How do we reproduce it?
Follow README hobby installation guide without having grafana as part of docker, a local OSS grafana (installed via sudo apt install grafana
is used.
Grafana OnCall Version
v1.11.3 OpenSource
Product Area
Helm/Kubernetes/Docker
Grafana OnCall Platform?
Docker
User's Browser?
No response
Anything else to add?
No response
We have managed to solve most of our problems. We think that due to that in the past we have installed an older version of OnCall (v1.6.2) but didn't update within grafana the plugin. As mentioned in the issue update button isn't visible, even though I am on an admin account (not the admin account), we updated the plugin via the command line interface:
grafana-cli plugins update-all
After which we followed the steps in the README and managed to get it mostly working.
However when setting-up an integration with grafana alerting we obtain an error message:
"Failed to update AlertManager Config"
Looking at the docker compose log from engine shows:
engine_1 | 2024-10-20 20:45:30 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 400 Client Error: Bad Request for url: http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts
engine_1 | 2024-10-20 20:45:30 source=engine:app google_trace_id=none logger=root outbound latency=0.007630477019120008 status=400 method=POST url=http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts slow=0
engine_1 | 2024-10-20 20:45:30 source=engine:app google_trace_id=none logger=apps.alerts.grafana_alerting_sync_manager.grafana_alerting_sync GrafanaAlertingSyncManager: Failed to update contact point (POST) for is_grafana_datasource True; response: {'url': 'http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts', 'connected': False, 'status_code': 400, 'message': '400 Client Error: Bad Request for url: http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts'}
We are not on the admin account but an account with admin privileges, or isn't that the same?
I do not know if this belongs to this issue or a separate one needs to be made.
With the latest releases I now get a new , but similar go error in grafana:
In the logs I see the following related error:
logger=plugin.grafana-oncall-app t=2024-10-21T11:18:25.293508052Z level=error msg="Error getting user" error="failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={\"message\":\"Not found\"}\n"
Mon, Oct 21 2024 1:18:25 pm
logger=plugin.grafana-oncall-app t=2024-10-21T11:18:25.293526415Z level=error msg="Error validating oncall plugin settings" error="error setting up request headers: failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={\"message\":\"Not found\"}\n "
and in oncall i see the following error:
2024-10-21 11:30:02 source=engine:app google_trace_id=none logger=apps.auth_token.auth auth request user not found - missing valid X-Grafana-Context
Mon, Oct 21 2024 1:30:02 pm
@bpedersen2 thank you. Are your errors when using a https
server? I have more success with http
.
My earlier answer is missing that to install the plugin correctly with the curl provisioning command I needed to use a service account token (the grafana admin account didn't work).
So for example:
curl -v -H "Authorization: Bearer glsa_xxyy" -X POST 'http://<IP>:3000/api/plugins/grafana-oncall-app/settings' -H "Content-Type: application/json" -d '{"enabled":true, "jsonData":{"stackId":5, "orgId":100, "onCallApiUrl":"http://localhost:8080/", "grafanaUrl":"http://<IP>:3000/"}}'
Followed with:
curl -v -H "Authorization: Bearer glsa_xxyy" -X POST 'http://<IP>:3000/api/plugins/grafana-oncall-app/resources/plugin/install'
Here glsa_xxyy
is your service account token.
However as mentioned earlier when setting up a grafana integration we get the " Failed to update AlertManager Config" & the earlier mentioned engine logs. The same error occurs on the grafana admin account. I am now searching for a way how to use that service account token during integration set-up.
I hoped setting in the docker_compose.yml
the SECRET_KEY
equal to the service account key, inspired by this post but no success.
SECRET_KEY : glsa_xxyy
In hindsight the problem seems actually not related to an access but a faulty call when setting up the grafana integration.
The external grafana server logs show:
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=context userId=59 orgId=1 uname=sa-1-extsvc-grafana-oncall-app t=2024-10-23T00:15:07.091134615+02:00 level=info msg="Request Completed" method=GET path=/api/plugins/grafana-incident-app/settings status=404 remote_addr=<IP> time_ms=17 duration=17.908058ms size=64 referer= handler=/api/plugins/:pluginId/settings status_source=server
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=plugin.grafana-oncall-app t=2024-10-23T00:15:07.092111659+02:00 level=error msg="getting incident plugin settings" error="request did not return 200: 404"
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=context userId=59 orgId=1 uname=sa-1-extsvc-grafana-oncall-app t=2024-10-23T00:15:07.106578696+02:00 level=info msg="Request Completed" method=GET path=/api/plugins/grafana-labels-app/settings status=404 remote_addr=<IP> time_ms=13 duration=13.635132ms size=64 referer= handler=/api/plugins/:pluginId/settings status_source=server
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=plugin.grafana-oncall-app t=2024-10-23T00:15:07.10762901+02:00 level=error msg="getting labels plugin settings" error="request did not return 200: 404"
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=plugin.grafana-oncall-app t=2024-10-23T00:15:07.108752524+02:00 level=info msg=GetUser user="map[Email:admin@company.nl Login:admin Name:sysadmin Role:Admin]"
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=base.plugin.context t=2024-10-23T00:15:07.149239232+02:00 level=warn msg="Could not create user agent" error="invalid user agent format"
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=context userId=59 orgId=1 uname=sa-1-extsvc-grafana-oncall-app t=2024-10-23T00:15:07.253608104+02:00 level=error msg="bad request data" error="yaml: found invalid Unicode character escape code" remote_addr=<IP> traceID=
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=context userId=59 orgId=1 uname=sa-1-extsvc-grafana-oncall-app t=2024-10-23T00:15:07.253664383+02:00 level=info msg="Request Completed" method=POST path=/api/alertmanager/grafana/config/api/v1/alerts status=400 remote_addr=<IP> time_ms=5 duration=5.316209ms size=43 referer= handler=/api/alertmanager/grafana/config/api/v1/alerts status_source=server
Oct 23 00:15:07 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=context userId=1 orgId=1 uname=admin t=2024-10-23T00:15:07.257372712+02:00 level=info msg="Request Completed" method=POST path=/api/plugins/grafana-oncall-app/resources/alert_receive_channels/CXKN4JU9YFXWE/connect_contact_point/ status=400 remote_addr=<IP> time_ms=199 duration=199.266958ms size=49 referer="http://<IP>:3000/a/grafana-oncall-app/integrations/CXKN4JU9YFXWE?p=1&tab=monitoring-systems" handler=/api/plugins/:pluginId/resources/* status_source=server
Oct 23 00:15:08 imeasurements /usr/share/grafana/bin/grafana[3293851]: logger=base.plugin.context t=2024-10-23T00:15:08.576371694+02:00 level=warn msg="Could not create user agent" error="invalid user agent format"
The docker compose engine logs show:
engine_1 | 2024-10-22 22:15:07 source=engine:app google_trace_id=none logger=root outbound latency=0.045667638012673706 status=200 method=GET url=http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts slow=0
engine_1 | 2024-10-22 22:15:07 source=engine:app google_trace_id=none logger=root outbound latency=0.007832036004401743 status=200 method=GET url=http://<IP>:3000/api/alert-notifiers slow=0
engine_1 | 2024-10-22 22:15:07 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 400 Client Error: Bad Request for url: http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts
engine_1 | 2024-10-22 22:15:07 source=engine:app google_trace_id=none logger=root outbound latency=0.007206168957054615 status=400 method=POST url=http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts slow=0
engine_1 | 2024-10-22 22:15:07 source=engine:app google_trace_id=none logger=apps.alerts.grafana_alerting_sync_manager.grafana_alerting_sync GrafanaAlertingSyncManager: Failed to update contact point (POST) for is_grafana_datasource True; response: {'url': 'http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts', 'connected': False, 'status_code': 400, 'message': '400 Client Error: Bad Request for url: http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts'}
That points to a faulty formatted OnCall string.
I am closing the issue.
Using a grafana service token in place of the admin:admin
in the curl call has the permissions to set-up the connection when
using http address.
My earlier remarks about not able to set-up grafana alerting integration in OnCall is due to the usage of unicode (emoji's) in contact points & notification templates (but might not be limited to these locations). See also open issue #4653 .