ooni/sysadmin

labs.ooni.io down

darkk opened this issue · 1 comments

darkk commented

Impact: none? labs SSL cert is expired, data is not updated, that's "lab" for development & WIP stuff

Detection: alert form Prometheus

Timeline UTC:
12 Nov, Mon 15:55 GH opens ticket about IP addresses renumbering with the 18th being the deadline (to move the network to another location, see #241), but email about the ticket is lost due to misclick
18 Nov, Sun 19:40 VM goes down, traceroutes towards the VM show route loops
19 Nov, Mon 09:07 mail from GH comes confirming successful VM renumbering
19 Nov, Mon 09:39 @hellais updates DNS to reflect renumbering, VM goes back up
19 Nov, Mon 09:52 routing loop still exists in AS133752
19 Nov, Mon 10:38 routing loop disappeared by this moment
19 Nov, Mon 11:52 @darkk adds on-renumber tag (see 4194617) as some configuration files cache IP addresses

What went wrong:

  • renumbering was confused with routing issue

What went well:

  • default DNS TTL in the zone is 30m, that's reasonable delay

What is still unclear:

  • should configuration files avoid IP address caches by all means?...
darkk commented

This incident is written for historical purposes. I think, it has no action points.
I believe, that caching of IP addresses for bind() directives is okay.
Maybe, I'll change the opinion while doing re-numbering for #241.