Restore `repo-path1` descriptions is confusing
robbash opened this issue · 3 comments
Hi TSDB team,
we are running TSDN single in our K8s cluster and have the backup set up with S3.
As we move to a new cluster we want to test the restore from backup use-case and found that we couldn't really make sense of the comment in the Helm values:
Are we supposed to set it to the path where the current backup is, so it can be found? In that case, how does it protect the backup from being overwritten? And if it needs to be something different, how can the backup location be found?
It would be great to be more explicit on the use.
Thanks!
I have also found the documentation about restore from backup a bit confusing. After experimentation this is what I found:
- When settings
backup.enabled: true
, the backup path within the S3 bucket is automatically set to{KUBE_NAMESPACE}/{HELM_DEPLOYMENT_NAME}
- When restoring from backup (
bootstrapFromBackup.enabled: true
) you have the option of changing this path, so you can restore the backup from another deployment on the same S3 bucket. - If you want to restore the backup to the same namespace/deployment: Update the deployment with an empty storage volume,
bootstrapFromBackup.enabled: true
andrepo1-path
set appropriately. (note that you can keepbackup.enabled: true
, in that case it will first restore from backup, then resume the backups to the same location) - If you want to restore the backup to another namespace/deployment: Do you deployment with
bootstrapFromBackup.enabled: true
andrepo1-path
set to the namespace/deployment you want to restore from. You can also havebackup.enabled: true
; since the namespace/deployment name is different, the backup from the new deployment won't conflict with the old one.
Hi @bastienmenis. Thanks for sharing your insights! 👍
I have also tested further and can confirm your observations. My use-case was migrating the TSDB into a new K8s cluster so I chose to keep namespaces and deployment names the same. Because I wasn't sure whether it's safe to have restore and backup point to the same location I'm using different S3 buckets. With that, it helped a lot to use bootstrapFromBackup.secretName
where I overwrote the values of secrets.pgbackrest
where they were different for the restore location.
I have the same goal, to bootstrap a second unique deployment into a new namespace from the first's S3 backup.
My results failed with the following logs. Looks like something (not sure what yet) is not actually downloading the archive. The path referenced does exist in S3. I can restore a new pod from it in the first cluster, so the creds + files are correct.
Defaulted container "timescaledb" out of: timescaledb, tstune (init)
2023-11-01 20:32:47,569 WARNING: Retry got exception: 'connection problems'
/var/run/postgresql:5432 - no response
2023-11-01 20:32:47,575 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
Sourcing /home/postgres/.pod_environment
2023-11-01 20:32:47 - restore_or_initdb - Attempting restore from backup
2023-11-01 20:32:47 - restore_or_initdb - Listing available backup information
WARN: environment contains invalid option 'backup-enabled'
stanza: poddb
status: error (missing stanza path)
WARN: environment contains invalid option 'backup-enabled'
WARN: repo1: [FileMissingError] unable to load info file '/default/postgres-timescale/backup/poddb/backup.info' or '/default/postgres-timescale/backup/poddb/backup.info.copy':
FileMissingError: unable to open missing file '/default/postgres-timescale/backup/poddb/backup.info' for read
FileMissingError: unable to open missing file '/default/postgres-timescale/backup/poddb/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
ERROR: [075]: no backup set found to restore
2023-11-01 20:32:47.609 P00 INFO: restore command begin 2.44: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=29-2d699e81 --link-all --log-level-console=detail --pg1-path=/var/lib/postgresql/data --process-max=4 --repo1-cipher-type=none --repo1-path=/default/postgres-timescale --spool-path=/var/run/postgresql --stanza=poddb
2023-11-01 20:32:47.610 P00 INFO: restore command end: aborted with exception [075]
2023-11-01 20:32:47 - restore_or_initdb - Bootstrap from backup failed
Looks like there's a bunch of open bugs with the same issue. I'm already using a forked chart because the current release has a broken pgbackrest initialization procedure. I could try and debug it from there.