alerta/alerta

metrics count column overflow'ed 32 bit max int

Closed this issue · 2 comments

Issue Summary
We ended up outgrowing integer for the metrics / count column.

I'm not exactly sure what this column is used for/means. the alerta web UI and API continued to work but we stopped getting any new alert notifications, things got eerily quiet.

We started seeing these errors in postgres logs:

2023-01-16 15:44:27.370 EST [2780632] alertauser@alertadb ERROR:  integer out of range
2023-01-16 15:44:27.370 EST [2780632] alertauser@alertadb STATEMENT:
                    INSERT INTO metrics ("group", name, title, description, count, total_time, type)
                    VALUES ('alerts', 'received', 'Received alerts', 'Total time and number of received alerts', 1, 28, 'timer')
                    ON CONFLICT ("group", name, type) DO UPDATE
                        SET count=metrics.count + 1, total_time=metrics.total_time + 28
                    RETURNING *

Confirmed we hit the max

alertadb=# select max(count), max(value), max(total_time) from metrics;
    max     | max  |     max
------------+------+-------------
 2147483647 | 2639 | 42525454688

Resolved it with ALTER TABLE metrics ALTER COLUMN count TYPE BIGINT;

not sure if we just had this running long enough that we hit a limit - or if that 'count' should never get that high?

Environment

  • OS: Ubuntu 20.04

  • API version: 8.4.1

  • Deployment: self hosted

  • For self-hosted, WSGI environment: nginx/uwsgi

  • Database: Postgres

  • Server config:
    Auth enabled? Yes
    Auth provider? Basic
    Customer views? No

To Reproduce
Not sure - run alerta server for a year or two?

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

NOTE: Please provide as much information about your issue as possible.
Failure to provide basic details about your specific environment make
it impossible to know if an issue has already been fixed, can delay a
response and may result in your issue being closed without a resolution.

@sixcare can you resubmit your fix for this issue as the only change in the pull request and I will merge? Thanks

@sixcare can you resubmit your fix for this issue as the only change in the pull request and I will merge? Thanks

Done