metrics count column overflow'ed 32 bit max int
Closed this issue · 2 comments
Issue Summary
We ended up outgrowing integer
for the metrics
/ count
column.
I'm not exactly sure what this column is used for/means. the alerta web UI and API continued to work but we stopped getting any new alert notifications, things got eerily quiet.
We started seeing these errors in postgres logs:
2023-01-16 15:44:27.370 EST [2780632] alertauser@alertadb ERROR: integer out of range
2023-01-16 15:44:27.370 EST [2780632] alertauser@alertadb STATEMENT:
INSERT INTO metrics ("group", name, title, description, count, total_time, type)
VALUES ('alerts', 'received', 'Received alerts', 'Total time and number of received alerts', 1, 28, 'timer')
ON CONFLICT ("group", name, type) DO UPDATE
SET count=metrics.count + 1, total_time=metrics.total_time + 28
RETURNING *
Confirmed we hit the max
alertadb=# select max(count), max(value), max(total_time) from metrics;
max | max | max
------------+------+-------------
2147483647 | 2639 | 42525454688
Resolved it with ALTER TABLE metrics ALTER COLUMN count TYPE BIGINT;
not sure if we just had this running long enough that we hit a limit - or if that 'count' should never get that high?
Environment
-
OS: Ubuntu 20.04
-
API version: 8.4.1
-
Deployment: self hosted
-
For self-hosted, WSGI environment: nginx/uwsgi
-
Database: Postgres
-
Server config:
Auth enabled? Yes
Auth provider? Basic
Customer views? No
To Reproduce
Not sure - run alerta server for a year or two?
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
NOTE: Please provide as much information about your issue as possible.
Failure to provide basic details about your specific environment make
it impossible to know if an issue has already been fixed, can delay a
response and may result in your issue being closed without a resolution.