ooni/sysadmin

[FIRING] Lots of `scrape_samples_scraped` lost Now ~ 2.83k, 24h ago ~ 27.96k.

hellais opened this issue · 1 comments

We are seeing since today this alert:

[FIRING] Lots of `scrape_samples_scraped` lost
Now ~ 2.83k, 24h ago ~ 27.96k.

This previously was a signal for: #343, but at a cursory glance metrics from hosts seem to be collected properly.

@SuperQ do you have other points for stuff we should be looking at to investigate this and verify if it's indeed a problem?

This is the definition of the alert:

- alert: ScrapeSamplesLoss

It turned out that doing aa5e846 had the uninstended consequence of breaking the rule in 78af081#diff-eaac1fdadca0a783965b2593ce5845f1R138 which broke all monitoring.

I have fixed it.