/cloudsql_cross-region_failover_2022

Automating cross-region failures in Google Cloud SQL with HA and Cascading Replicas

Primary LanguageShell

cloudsql_cross-region_failover_2022

Automating cross-region failures in Google Cloud SQL with HA and Cascading Replicas

This project covers the automation of conducting a cross region failover in Google Cloud SQL using HA Replicas and Cascading Replicas. If you are not suing these features, please see my original script using standard replicas.

Assumptions

Before running this playbook, please confirm you have the following architecture components in place:

  • You are using a High Availability Cloud SQL instance as your Primary Instance.
  • You have at least one Read Replica in a different GCP Region than your Primary Instance, and that Read Replica is configured for High Availability. Without having a read replica in a different GCP region provisioned prior to an incident, the failover process as automated in this playbook is not possible.
  • If you have additional Read Replicas serving read-only applications, those replicas are using Cascading Replication so they are not dependent on the Primary Instance for Replication.

Example Architecture

PNG

What does this script do?

The automated_DR_sql_failover script automates the process of failing a Cloud SQL instance over to a different GCP Region during a regional outage event. To accomplish this, the script automates the following based on user input:
  1. Selecting the right GCP Project that the instance resides in (user input)
  2. Capturing the Instance ID of the Primary Instance in the region that is down (user input)
  3. Capturing the Instance ID of the DR Read Replica that you want to failover to (user input)
  4. Facilitating the failover by promoting the DR Instance in the new region to the primary writable cloud sql instance
  5. Providing the connection details for the newly promoted Instance
  6. Replacing the original Primary Instance with an HA Read Replica in the same Region/Zone for future failback procedures

Note: it is not necessary to recreate other downstream replicas if they are using Cascading Replication becuase they are not reliant on the original Primary Instance for replication

Failing back to your Primary Region

You can use the automated_post-DR_sql_failback script to conduct a controlled failover back to the orginal Region and Zone you used prior to the regional outage. This script will complete a very similar process to the one specified above, but it will leverage the HA Read Replica created in step #6 as the failover target, and will entail fully replacing any and all replicas.

Migration vs Disaster Recovery

This script was designed with Disaster Recovery in mind but a planned "Regional Migration" is no different. It can be used for this scenario as well.

Using the script via Cloud Shell

To use this script via cloud shell or any other shell:
  • In the Google Cloud Console (console.google.com) or on your own device, open a terminal window
  • Clone the repository into your working directory
  • Run one of the scripts using "bash" - ex. bash automated_DR_sql_failover.sh
  • Look for prompts and instructions in the terminal. It will guide you through the failover process.