alerta/alerta

Unable to change alert status - 409 error

Closed this issue · 3 comments

Issue Summary

We are frequently seeing the errors "Invalid Action for current x status (409)", where x is "closed", "shelved" or "ack", when trying to modify the status of Open alerts through the Web UI. The alerts can be deleted, but their status cannot be modified - they continue to appear as having Open status in the Web UI.

We haven't been able to reproduce this reliably.

Environment

  • OS: Linux

  • API version: 8.5.0

  • Deployment: Azure Kubernetes Service

  • Database: Azure CosmosDB

  • Server config: Output of config endpoint: alerta-config.txt

  • web UI version: 8.5.0

  • CLI version: 8.5.0

To Reproduce
Steps to reproduce the behavior:

Unknown - this affects some alerts but not others. We have not been able to reliably work out what the difference is yet.

Expected behavior

Expect that closing / acking / shelving alerts should reliably work in the Web UI, rather than returning an error message.

Thanks for reporting as perhaps others are experiencing the same issue. If you get any more information add it to this ticket.

Thanks Nick. We've traced the problem back to having two files in the Alerta MongoDB backend with an identical environment, resource, event, and customer - one file records the alert status as "open" (henceforth File 1) and the other records its status as "closed" (henceforth File 2). I expect that the Alerta GUI displays the "open" alert found in File 1, but when trying to close it it is blocked by the fact that File 2 is already in "closed" status.

I'm trying to work out how the alert got into this state. Looking through the histories for the two files:

  • File 1 is created at time 1680179012560
  • File 1 has a normal cycle of "Open -> Closed -> Open"
  • The following three events happen within 1.5 seconds:
    • At time 1683807140197, File 1 records the status "Open"
    • At time 1683810740607, File 2 is created and records the status "Closed"
    • At time 1683810741692, File 1 records the status "Closed"
  • Thereafter, all "Open" events are recorded in File 1 and all "Close" events are recorded in File 2. "Close" and "Open" events always alternate in time.
  • This alternation stops in the most recent 4 events. The most recent 4 events are as follows:
    • At time 1686318033207, File 1 records the status "Open"
    • At time 1686318034272, File 1 records the status "Open" (File 1's current status, not in history)
    • At time 1686318622804, File 2 records the status "Closed"
    • At time 1686318623197, File 2 records the status "Closed" (File 2's current status, not in history)
  • I believe it was at this point that attempting to "close" the alert started to result in 409 errors.

I don't know what the expected behaviour is here - is it normal to have two separate files recording history for the same alert? (where I'm defining an "alert" as a unique environment, resource, event and customer)

In our deployment we generally have multiple instances of Alerta running against the same database backend - I wouldn't be surprised if this is an issue relating to two different Alerta instances trying to update an alert at the same time.

After a bit more digging, it turns out that indexes had not been created properly on the database. Specifically, this line which enforces uniqueness of Alert documents had not worked properly - the index was not present on databases that experienced this problem.

The databases were created a long time ago so we haven't been able to work out exactly what happened, but it's probably a symptom of the imperfect Azure CosmosDB implementation of MongoDB. There are other symptoms of this, which I've created a separate issue (#1868) to track.