openstreetmap/operations

Scheduled loss of redundant power at Amsterdam

Closed this issue · 4 comments

Equinix will be performing 5 year maintenance on the MDP-A/MCC-A power at AM6, resulting in switching one of the feeds off for up to 8 hours on SATURDAY, 11 JUN 20:00 - SUNDAY, 12 JUN 04:00 UTC.

We should check

  • all power supplies are functional
  • note down any equipment switched off
  • verify that equipment is still switched off after maintenance work

https://hardware.openstreetmap.org/servers/dulcy.openstreetmap.org/ has a failed PSU and could experience an outage.
#513

Update: OpenStreetMap services failed last night during the power outage.

The initial understanding is the Cisco RPS 2300 (dual redundant power feed device) failed which brought down our switches and upstream network connection.

Once AM6 power maintenance work was completed the power was returned and our switches returned to operation.

Outage was from 20:15 (UTC+1) until 4:47am (UTC+1).

We believe the follow happened:

  1. We have 2x Cisco SG550X switches (SW1 and SW2), connected to A and B power feeds respectively.
  2. The 2x Cisco SG550X are also connected to the Cisco RPS 2300 which is powered by both A and B feeds. The Cisco RPS 2300 is meant to power a switch if the switches loses power via its primary A or B feed respectively.
  3. Unexpectedly SW1 connected to A power feed failed during A power feed maintenance, without the Cisco RPS providing redundant power.
  4. It appears due to an unknown condition the Cisco RPS 2300 can enter supply standby mode where it does not supply power to the Cisco SG550X devices. The supply can be activated via the front panel of the Cisco RPS 2300. We believe the Cisco RPS 2300 was in standby state when power feed A was powered down.
  5. Only SW1 had the uplink to Internet. With SW1 powered down due to not being supplied with power our uplink was down.

We are replacing the Cisco SG550X with Juniper Switch #656 and will get dual uplinks @ AMS.