Automate the backup process using BigQuery Snapshot
The scan/main.py uses BigQuery INFORMATION_SCHEMA metadata tables to find all tables (base tables) requires to be backed up. You can use one or multiple projects depending on where your source data are stored, and it will automatically find all tables belongs to a single region (or multi regional region). The convention of the backup dataset name will include the source project name to avoid conflict.
All tables will be sent in the form of a single message to a Pub/Sub topic containing information of the table to be backed up.
gcloud pubsub topics create tables_to_be_backedup
gcloud pubsub topics create scheduled_scan_for_backup
Run this under the
scan
folder
export PROJECT_ID=YOUR_GCP_PROJECT_ID_TO_RUN_CLOUD_FUNCTION
export BACKUP_PROJECT_ID=YOUR_PROJECT_ID_WHERE_THE_BACKUPS_GO
export PROJECTS_TO_SCAN=PROJECTS_YOUR_WANT_TO_SCAN
export TOPIC_ID=YOUR_PUBSUB_TOPIC_TO_SEND_TABLES_TO_BE_BACKED_UP
gcloud functions deploy scan_and_send_to_pubsub \
--runtime python38 \
--region europe-west2 \
--trigger-topic scheduled_scan_for_backup \
--ingress-settings internal-only \
--set-env-vars PROJECT_ID=${PROJECT_ID},BACKUP_PROJECT_ID=${BACKUP_PROJECT_ID},PROJECTS_TO_SCAN=${PROJECTS_TO_SCAN},TOPIC_ID=${TOPIC_ID}
The trigger/main.py python component is the subscriber part used to do the back of individual BigQuery tables passed from the scan. This is very powerful because backups of all tables can be handled concurrently and greatly speeds up the backup process.
Error handling in this part is important because failing for a bad reason could result in retrying for a very long period of time and could lead to unexpected behaviours. The version of the code is designed to be idempotent.
Run this under the
trigger
folder
export PROJECT_ID=YOUR_GCP_PROJECT_ID_TO_RUN_CLOUD_FUNCTION
gcloud functions deploy backup_table \
--runtime python38 \
--region europe-west2 \
--trigger-topic tables_to_be_backedup \
--ingress-settings internal-only \
--set-env-vars PROJECT_ID=${PROJECT_ID} \
--retry
Very straight forward, see docs here