timescale/helm-charts

Restore `repo-path1` descriptions is confusing

robbash opened this issue · 3 comments

Hi TSDB team,

we are running TSDN single in our K8s cluster and have the backup set up with S3.

As we move to a new cluster we want to test the restore from backup use-case and found that we couldn't really make sense of the comment in the Helm values:

https://github.com/timescale/helm-charts/blob/7ded6b654c956a3f6dc119d90b47a0262eba600e/charts/timescaledb-single/values.yaml#L159C1-L159C1

Are we supposed to set it to the path where the current backup is, so it can be found? In that case, how does it protect the backup from being overwritten? And if it needs to be something different, how can the backup location be found?

It would be great to be more explicit on the use.

Thanks!

I have also found the documentation about restore from backup a bit confusing. After experimentation this is what I found:

  • When settings backup.enabled: true, the backup path within the S3 bucket is automatically set to {KUBE_NAMESPACE}/{HELM_DEPLOYMENT_NAME}
  • When restoring from backup (bootstrapFromBackup.enabled: true) you have the option of changing this path, so you can restore the backup from another deployment on the same S3 bucket.
  • If you want to restore the backup to the same namespace/deployment: Update the deployment with an empty storage volume, bootstrapFromBackup.enabled: true and repo1-path set appropriately. (note that you can keep backup.enabled: true, in that case it will first restore from backup, then resume the backups to the same location)
  • If you want to restore the backup to another namespace/deployment: Do you deployment with bootstrapFromBackup.enabled: true and repo1-path set to the namespace/deployment you want to restore from. You can also have backup.enabled: true; since the namespace/deployment name is different, the backup from the new deployment won't conflict with the old one.

Hi @bastienmenis. Thanks for sharing your insights! 👍

I have also tested further and can confirm your observations. My use-case was migrating the TSDB into a new K8s cluster so I chose to keep namespaces and deployment names the same. Because I wasn't sure whether it's safe to have restore and backup point to the same location I'm using different S3 buckets. With that, it helped a lot to use bootstrapFromBackup.secretName where I overwrote the values of secrets.pgbackrest where they were different for the restore location.

I have the same goal, to bootstrap a second unique deployment into a new namespace from the first's S3 backup.

My results failed with the following logs. Looks like something (not sure what yet) is not actually downloading the archive. The path referenced does exist in S3. I can restore a new pod from it in the first cluster, so the creds + files are correct.

Defaulted container "timescaledb" out of: timescaledb, tstune (init)
2023-11-01 20:32:47,569 WARNING: Retry got exception: 'connection problems'
/var/run/postgresql:5432 - no response
2023-11-01 20:32:47,575 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
Sourcing /home/postgres/.pod_environment
2023-11-01 20:32:47 - restore_or_initdb - Attempting restore from backup
2023-11-01 20:32:47 - restore_or_initdb - Listing available backup information
WARN: environment contains invalid option 'backup-enabled'
stanza: poddb
    status: error (missing stanza path)
WARN: environment contains invalid option 'backup-enabled'
WARN: repo1: [FileMissingError] unable to load info file '/default/postgres-timescale/backup/poddb/backup.info' or '/default/postgres-timescale/backup/poddb/backup.info.copy':
      FileMissingError: unable to open missing file '/default/postgres-timescale/backup/poddb/backup.info' for read
      FileMissingError: unable to open missing file '/default/postgres-timescale/backup/poddb/backup.info.copy' for read
      HINT: backup.info cannot be opened and is required to perform a backup.
      HINT: has a stanza-create been performed?
ERROR: [075]: no backup set found to restore
2023-11-01 20:32:47.609 P00   INFO: restore command begin 2.44: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=29-2d699e81 --link-all --log-level-console=detail --pg1-path=/var/lib/postgresql/data --process-max=4 --repo1-cipher-type=none --repo1-path=/default/postgres-timescale --spool-path=/var/run/postgresql --stanza=poddb
2023-11-01 20:32:47.610 P00   INFO: restore command end: aborted with exception [075]
2023-11-01 20:32:47 - restore_or_initdb - Bootstrap from backup failed

Looks like there's a bunch of open bugs with the same issue. I'm already using a forked chart because the current release has a broken pgbackrest initialization procedure. I could try and debug it from there.