kanisterio/kanister

Stackgres blueprint support

mlavi opened this issue · 3 comments

Originally posted today = https://community.veeam.com/kasten-k10-support-92/recommanded-way-to-backup-postgres-stackgres-7265?postid=62222#post62222

We also use StackGres for our databases.

Now we tried Kasten and found that our DB StatefulSets are not correctly recognized as ready by Kasten.

When I start a backup, it waits for 3 running replicas:
3 replicas specified and only 0 are running
But 3 Pods are running fine inside the StatefulSet, the SGCluster is healthy and works fine.

- cause:
    cause:
      cause:
        cause:
          cause:
            message: "Specified 3 replicas and only 0 are running: Context done while
              polling: context deadline exceeded"
          fields:
            - name: namespace
              value: test
            - name: name
              value: test
          file: kasten.io/k10/kio/exec/phases/phase/snapshot.go:426
          function: kasten.io/k10/kio/exec/phases/phase.WaitOnWorkloadReady
          linenumber: 426
          message: Statefulset not in ready state. Retry the operation once Statefulset is
            ready
        fields:
          - name: workloadName
            value: test
          - name: workloadNamespace
            value: test
        file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1158
        function: kasten.io/k10/kio/exec/phases/backup.WaitForWorkloadWithSkipWait
        linenumber: 1158
        message: Error while waiting for workload to be ready
      fields:
        - name: workloadName
          value: test
        - name: workloadNamespace
          value: test
      file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1173
      function: kasten.io/k10/kio/exec/phases/backup.WaitForWorkload
      linenumber: 1173
      message: Error while waiting for workload to become ready
    file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:372
    function: kasten.io/k10/kio/exec/phases/backup.processVolumeArtifacts
    linenumber: 372
    message: Error encountered waiting for workload
  message: Job failed to be executed
- cause:
    cause:
      cause:
        cause:
          cause:
            message: "Specified 1 replicas and only 0 are running: could not get
              StatefulSet{Namespace: test, Name: test}: client rate limiter Wait
              returned an error: rate: Wait(n=1) would exceed context deadline"
          fields:
            - name: namespace
              value: test
            - name: name
              value: test
          file: kasten.io/k10/kio/exec/phases/phase/snapshot.go:426
          function: kasten.io/k10/kio/exec/phases/phase.WaitOnWorkloadReady
          linenumber: 426
          message: Statefulset not in ready state. Retry the operation once Statefulset is
            ready
        fields:
          - name: workloadName
            value: test
          - name: workloadNamespace
            value: test
        file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1158
        function: kasten.io/k10/kio/exec/phases/backup.WaitForWorkloadWithSkipWait
        linenumber: 1158
        message: Error while waiting for workload to be ready
      fields:
        - name: workloadName
          value: test
        - name: workloadNamespace
          value: test
      file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1173
      function: kasten.io/k10/kio/exec/phases/backup.WaitForWorkload
      linenumber: 1173
      message: Error while waiting for workload to become ready
    file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:372
    function: kasten.io/k10/kio/exec/phases/backup.processVolumeArtifacts
    linenumber: 372
    message: Error encountered waiting for workload
  message: Job failed to be executed
- cause:
    cause:
      cause:
        cause:
          cause:
            message: "Specified 3 replicas and only 0 are running: could not get
              StatefulSet{Namespace: test, Name: test}: client rate limiter Wait
              returned an error: context deadline exceeded"
          fields:
            - name: namespace
              value: test
            - name: name
              value: test
          file: kasten.io/k10/kio/exec/phases/phase/snapshot.go:426
          function: kasten.io/k10/kio/exec/phases/phase.WaitOnWorkloadReady
          linenumber: 426
          message: Statefulset not in ready state. Retry the operation once Statefulset is
            ready
        fields:
          - name: workloadName
            value: test
          - name: workloadNamespace
            value: test
        file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1158
        function: kasten.io/k10/kio/exec/phases/backup.WaitForWorkloadWithSkipWait
        linenumber: 1158
        message: Error while waiting for workload to be ready
      fields:
        - name: workloadName
          value: test
        - name: workloadNamespace
          value: test
      file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1173
      function: kasten.io/k10/kio/exec/phases/backup.WaitForWorkload
      linenumber: 1173
      message: Error while waiting for workload to become ready
    file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:372
    function: kasten.io/k10/kio/exec/phases/backup.processVolumeArtifacts
    linenumber: 372
    message: Error encountered waiting for workload
  message: Job failed to be executed

Thanks for opening this issue 👍. The team will review it shortly.

If this is a bug report, make sure to include clear instructions how on to reproduce the problem with minimal reproducible examples, where possible. If this is a security report, please review our security policy as outlined in SECURITY.md.

If you haven't already, please take a moment to review our project's Code of Conduct document.

Steps to reproduce:

  1. prepair a Cluster on VSphere with Vspher-CSI Provider and VolumeSnapshotter (not sure if the underlaying cluster and storage provider is relevant but we use a RKE2 Cluster on top of VSphere)
  2. Setup StackGres Operator https://stackgres.io/install/
  3. setup a basic SGCluster:
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  name: test
  namespace: test
spec:
  instances: 1
  pods:
    persistentVolume:
      size: 5Gi
  postgres:
    version: '16.2'
  profile: development
  1. setup Kasten K10 (we connected VSphere Infrastructure and Minio as Location for Exports)
  2. take a Snapshot from your namespace where the SGCluster was created.
  3. it will fail with the message above.

@mlavi & @MSandro: It might be worth noting that this issue has been resolved with PR #3209, which is not yet available in a Kanister release, however it is included in Kasten K10 release 7.0.14.