For Veritas InfoScale CSI provisioner integration with velero-csi and aws plugin restoration from snapshot fails
Shreyashirwadkar opened this issue · 5 comments
Using Velero CSI plugin with CSI snapshots enabled for creating backups.
Below is the command we used for installing velero,
velero install --provider aws --features=EnableCSI --plugins=velero/velero-plugin-for-csi:v0.4.0,velero/velero-plugin-for-aws:v1.6.0 --bucket mybkt --secret-file ./credentials-velero --use-volume-snapshots=True --backup-location-config region=minio,s3ForcePathStyle=True,s3Url=http://xx.xx.xx.xx:9000 ,publicUrl=http://xx.xx.xx.xx:9000 --secret-file ./credentials-velero --use-volume-snapshots=True --backup-location-config region=minio,s3ForcePathStyle=True,s3Url=http://xx.xx.xx:9000, publicUrl=http://xx.xx.xx.xx:9000 --snapshot-location-config region=default,profile=default
Using velero backup command for creating namespace backups ,
velero backup create postgres-backup-test --include-namespaces=postgres --wait
velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
postgres-backup-test Completed 0 0 2022-12-22 18:02:19 +0530 IST 29d default <none>
This backup is creating volumesnapshot and volumesnapshotcontent correctly.
But when we delete namespace try to restore it from backup , snapshot is not getting created correctly because of which underlying PVC and pod goes into pending state. We have seen below errors in csi-snapshotter
I1222 13:47:54.947439 1 connection.go:183] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1222 13:47:54.947443 1 connection.go:184] GRPC request: {}
I1222 13:47:54.948341 1 connection.go:186] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}}]}
I1222 13:47:54.948429 1 connection.go:187] GRPC error: <nil>
I1222 13:47:54.948434 1 connection.go:183] GRPC call: /csi.v1.Controller/ListSnapshots
I1222 13:47:54.948437 1 connection.go:184] GRPC request: {"snapshot_id":"snap_6k4f479bv8axq9hnu44n"}
I1222 13:47:55.310029 1 connection.go:186] GRPC response: {}
I1222 13:47:55.310089 1 connection.go:187] GRPC error: rpc error: code = Internal desc = parsing time "01:47:55 PM, +0000, UTC" as "2006-01-02 15:04": cannot parse "7:55 PM, +0000, UTC" as "2006"
E1222 13:47:55.310111 1 snapshot_controller.go:267] checkandUpdateContentStatusOperation: failed to call get snapshot status to check whether snapshot is ready to use "failed to list snapshot for content velero-velero-data-pvc-klw7f-rqzxh: \"rpc error: code = Internal desc = parsing time \\\"01:47:55 PM, +0000, UTC\\\" as \\\"2006-01-02 15:04\\\": cannot parse \\\"7:55 PM, +0000, UTC\\\" as \\\"2006\\\"\""
I1222 13:47:55.310121 1 snapshot_controller.go:143] updateContentStatusWithEvent[velero-velero-data-pvc-klw7f-rqzxh]
I1222 13:47:55.313204 1 snapshot_controller.go:189] updating VolumeSnapshotContent[velero-velero-data-pvc-klw7f-rqzxh] error status failed volumesnapshotcontents.snapshot.storage.k8s.io "velero-velero-data-pvc-klw7f-rqzxh" is forbidden: User "system:serviceaccount:infoscale-vtas:infoscale-csi-controller-17189" cannot patch resource "volumesnapshotcontents/status" in API group "snapshot.storage.k8s.io" at the cluster scope
I1222 13:47:55.313332 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshotContent", Namespace:"", Name:"velero-velero-data-pvc-klw7f-rqzxh", UID:"f7b313d3-c1c8-4991-aa4a-c2d1e5a142f8", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"562474614", FieldPath:""}): type: 'Warning' reason: 'SnapshotContentCheckandUpdateFailed' Failed to check and update snapshot content: failed to list snapshot for content velero-velero-data-pvc-klw7f-rqzxh: "rpc error: code = Internal desc = parsing time \"01:47:55 PM, +0000, UTC\" as \"2006-01-02 15:04\": cannot parse \"7:55 PM, +0000, UTC\" as \"2006\""
E1222 13:47:55.313234 1 snapshot_controller.go:124] checkandUpdateContentStatus [velero-velero-data-pvc-klw7f-rqzxh]: error occurred failed to list snapshot for content velero-velero-data-pvc-klw7f-rqzxh: "rpc error: code = Internal desc = parsing time \"01:47:55 PM, +0000, UTC\" as \"2006-01-02 15:04\": cannot parse \"7:55 PM, +0000, UTC\" as \"2006\""
E1222 13:47:55.313401 1 snapshot_controller_base.go:265] could not sync content "velero-velero-data-pvc-klw7f-rqzxh": failed to list snapshot for content velero-velero-data-pvc-klw7f-rqzxh: "rpc error: code = Internal desc = parsing time \"01:47:55 PM, +0000, UTC\" as \"2006-01-02 15:04\": cannot parse \"7:55 PM, +0000, UTC\" as \"2006\""
I1222 13:47:55.313430 1 snapshot_controller_base.go:167] Failed to sync content "velero-velero-data-pvc-klw7f-rqzxh", will retry again: failed to list snapshot for content velero-velero-data-pvc-klw7f-rqzxh: "rpc error: code = Internal desc = parsing time \"01:47:55 PM, +0000, UTC\" as \"2006-01-02 15:04\": cannot parse \"7:55 PM, +0000, UTC\" as \"2006\""
oc get volumesnapshotcontents.snapshot.storage.k8s.io
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT VOLUMESNAPSHOTNAMESPACE AGE
snapcontent-51de52a1-a24e-4bab-b3ae-a5033281606c true 1073741824 Retain org.veritas.infoscale csi-infoscale-snapclass velero-data-pvc-klw7f postgres 75m << snapshotcontent created during backup.
velero-velero-data-pvc-klw7f-rqzxh Retain org.veritas.infoscale csi-infoscale-snapclass velero-data-pvc-klw7f postgres 32m. <<snapshotcontent created while restoring from backup.
oc get volumesnapshotclasses.snapshot.storage.k8s.io
NAME DRIVER DELETIONPOLICY AGE
csi-infoscale-snapclass org.veritas.infoscale Retain 10d
csi-vsphere-vsc csi.vsphere.vmware.com Delete 14d
[root@bastion ~]# oc describe volumesnapshotclasses.snapshot.storage.k8s.io csi-infoscale-snapclass|grep -i label
Labels: velero.io/csi-volumesnapshot-class=true
f:labels:
Manager: kubectl-label
oc get volumesnapshots.snapshot.storage.k8s.io -n postgres
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
velero-data-pvc-klw7f false velero-velero-data-pvc-klw7f-rqzxh csi-infoscale-snapclass velero-velero-data-pvc-klw7f-rqzxh 32m
oc get volumesnapshotcontents.snapshot.storage.k8s.io --all-namespaces
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT VOLUMESNAPSHOTNAMESPACE AGE
snapcontent-51de52a1-a24e-4bab-b3ae-a5033281606c true 1073741824 Retain org.veritas.infoscale csi-infoscale-snapclass velero-data-pvc-klw7f postgres 12d
velero-velero-data-pvc-klw7f-rqzxh Retain org.veritas.infoscale csi-infoscale-snapclass velero-data-pvc-klw7f postgres 12d
oc get volumesnapshots --all-namespaces
NAMESPACE NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
postgres velero-data-pvc-klw7f false velero-velero-data-pvc-klw7f-rqzxh csi-infoscale-snapclass velero-velero-data-pvc-klw7f-rqzxh 12d
What did you expect to happen:
Restoring from Velero backup should happen correctly. We have tested for earlier release with CSI v0.1.0 and it was working well.
Environment:
Velero version (use velero version): 1.10
Velero features (use velero client config get features): Velero CSI, AWS
Kubernetes version (use kubectl version): Kubernetes Version: v1.24.0+dc5a2fd
Kubernetes installer & version:
Cloud provider or hardware configuration: OpenShift 4.11/4.10
OS (e.g. from /etc/os-release): coreos
But when we delete namespace try to restore it from backup , snapshot is not getting created correctly because of which underlying PVC and pod goes into pending state. We have seen below errors in csi-snapshotter
Why do you need to create a snapshot when you are doing a restore? Can you provide the restore command you run?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.