alerta/alerta

Upgrade from Alerta 8.5.0 to 9.0.0 resulting in "Failed to get config from http://localhost:8080/config. Reason: Expecting value: line 1 column 1 (char 0)"

Closed this issue · 4 comments

Issue Summary
I have been running Alerta v8.5.0 in a Kubernetes cluster using the docker-alerta image. I am trying to update to v9.0.0, but I'm getting issues when trying to access the GUI. Loading the config.json file is fine, but accessing the api/config endpoint results in a 504 nginx timeout (browser stays blank white, I can only see it is doing this using the dev networking tools in the browser). Most bemusingly, sometimes on start-up it will work initially - i.e. load the GUI as expected with alerts - and then fail when the page is refreshed.

I think this is an issue with the server set up, but I'm not sure what I've done wrong (or what has changed between 8.5.0 and 9.0.0 that would be causing this).

I'm seeing the following log on startup:

unable to load app 0 (mountpoint='') (callable not found or import error)

and then repeatedly (amongst other generic logs):

Failed to get config from http://localhost:8080/config. Reason: Expecting value: line 1 column 1 (char 0)

and

[error] 37#37: *1694 upstream timed out (110: Connection timed out) while connecting to upstream, client: 127.0.0.6, server: , request: "POST /api/webhooks/prometheus HTTP/1.1", upstream: "uwsgi://127.0.0.1:29000", host: "<CLUSTER_SERVICE_NAME>.local:80"

(where I've removed the cluster service name, but this is a standard kubernetes service).

Curling localhost:8080/config, I appear to get the contents of input.html.

Environment

  • OS: Linux
  • API version: 9.0.0
  • Deployment: Docker/Kubernetes

Versions:

  • Alerta Server 9.0.0

  • Alerta Client 8.5.2

  • Alerta WebUI 8.7.0

  • nginx version: nginx/1.22.0

  • uwsgi 2.0.21

  • MongoDB shell version v4.2.24

  • psql (PostgreSQL) 11.19 (Debian 11.19-0+deb10u1)

  • Python 3.8.16

  • Database: MongoDB

  • Server config:
    Auth enabled? No

The config.json file has:

{"endpoint": "/api"}

and this is in the same folder as index.html.

The uwsgi.ini file (cat app/uwsgi.ini) has:

[uwsgi]
chdir = /app
module = wsgi
manage-script-name = true
mount = /api=wsgi:app
master = true
processes = 5
listen = 100
max-worker-lifetime = 30

socket = 127.0.0.1:29000
buffer-size = 8192
chmod-socket = 664
uid = alerta
gid = root
vacuum = true

die-on-term = true
disable-logging = True

i.e. the default.

The alertad.conf file has:

DATABASE_RAISE_ON_ERROR = False
SIGNUP_ENABLED = False
AUTH_REQUIRED = False
DEBUG = True
COLUMNS = ['severity', 'status', 'createTime', 'lastReceiveTime', 'resource', 'event', 'service', 'effect']

DATABASE_NAME =  <DB_NAME>
DATABASE_URL = <DB_URL>

where I've just removed the database name / url here for security reasons.

I'm sure this is a misconfiguration on my part, but I'm struggling to see where or why.

Some more info:

It seems that we are able to access the GUI and all works as expected until housekeeping or heartbeats hits the Failed to get config from http://localhost:8080/config. Reason: Expecting value: line 1 column 1 (char 0) issue. After this, both of those process continually hit this and restart.

After a while (presumably because housekeeping is failing) we get:

2023-04-27 16:04:19,070 DEBG 'uwsgi' stdout output:
Thu Apr 27 16:04:19 2023 - *** uWSGI listen queue of socket "127.0.0.1:29000" (fd: 3) full !!! (101/100) ***

Is it possible that an empty / malformed alert has made its way onto the queue and is causing this issue?

Update: We are seeing timeouts on all of the API endpoints. I'm now consistently able to reproduce accessing the GUI when the container spins up, then I start to see failures after ~30s-1min. It seems like something must be causing the server to be unresponsive and / or crash after that time. Any ideas?

image

mfyll commented

Hello, did you manage to solve this?

@mfyll
The decode error Expecting value: line 1 column 1 (char 0) was because the default ENDPOINT for alerta/python-alerta-client, which is what the housekeeping + heartbeats processes are running with, is not correct – in the alerta.conf we had to add the lines:

[DEFAULT] 
endpoint = http://localhost:8080/api

Once we fixed that, it was still failing, but with the 504 gateway errors. These are because of this issue with docker-alerta:
alerta/docker-alerta#374
which has this pending fix:
alerta/docker-alerta#435

So we are waiting for that fix to be merged & tagged.

Fixes merged.