Possible improvements to Velero collector/analyzer
xavpaice opened this issue · 4 comments
Describe the rationale for the suggested feature.
For the Replicated team, a number of support issues raised are associated with Velero. The information in the Velero analyzer is useful, but not quite complete.
I would like to review the collector/analyzer for Velero, to see what improvements can be made that would have the most impact on our being able to solve support issues faster.
See https://github.com/vmware-tanzu/velero/issues/new?assignees=&labels=&projects=&template=bug_report.md for the kind of things that Velero themselves ask for information.
If we are able to produce a useful support bundle and analysis, there's also an opportunity to discuss adding this to the Velero project as a diagnostic tool to help the maintainers.
First step:
- ensure we collect all the info requested in a Velero bug report, in the Troubleshoot Velero collector
- Review support issues and find what info we needed to diagnose that, and what analysis could have helped highlight the issue earlier
Second step:
- write individual issues for updates to Troubleshoot
Velero has a velero debug
command already which collects a bunch of information.
The definition of done here is to:
- review the Velero analyzer in Troubleshoot
- check if there's missing information that would be helpful
- check that the analyzer is useful to our troubleshooting efforts with support issues
- produce detailed issues/stories for any changes that we recommend
Current info collected by velero debug
bundle
[gerard@gerard-kurl ~]$ velero debug --backup instance-ggs98
2024/03/12 01:11:39 Collecting velero resources in namespace: velero
2024/03/12 01:11:40 Collecting velero deployment logs in namespace: velero
2024/03/12 01:11:40 Collecting log and information for backup: instance-ggs98
2024/03/12 01:11:41 Generated debug information bundle: /home/gerard/bundle-2024-03-12-01-11-39.tar.gz
[gerard@gerard-kurl ~]$ tar -tzf /home/gerard/bundle-2024-03-12-01-11-39.tar.gz
velero-bundle
velero-bundle/backup_describe_instance-ggs98.txt
velero-bundle/backup_instance-ggs98.log
velero-bundle/kubecapture
velero-bundle/kubecapture/core_v1
velero-bundle/kubecapture/core_v1/velero
velero-bundle/kubecapture/core_v1/velero/node-agent-5k98b
velero-bundle/kubecapture/core_v1/velero/node-agent-5k98b/node-agent
velero-bundle/kubecapture/core_v1/velero/node-agent-5k98b/node-agent/node-agent.log
velero-bundle/kubecapture/core_v1/velero/pods-202403120111.6465.json
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/replicated-kurl-util
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/replicated-kurl-util/replicated-kurl-util.log
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/replicated-local-volume-provider
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/replicated-local-volume-provider/replicated-local-volume-provider.log
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero/velero.log
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero-velero-plugin-for-aws
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero-velero-plugin-for-aws/velero-velero-plugin-for-aws.log
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero-velero-plugin-for-gcp
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero-velero-plugin-for-gcp/velero-velero-plugin-for-gcp.log
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero-velero-plugin-for-microsoft-azure
velero-bundle/kubecapture/core_v1/velero/velero-854f967b7f-btw9q/velero-velero-plugin-for-microsoft-azure/velero-velero-plugin-for-microsoft-azure.log
velero-bundle/kubecapture/velero.io_v1
velero-bundle/kubecapture/velero.io_v1/velero
velero-bundle/kubecapture/velero.io_v1/velero/backuprepositories-202403120111.2620.json
velero-bundle/kubecapture/velero.io_v1/velero/backups-202403120111.2612.json
velero-bundle/kubecapture/velero.io_v1/velero/backupstoragelocations-202403120111.2617.json
velero-bundle/kubecapture/velero.io_v1/velero/podvolumebackups-202403120111.2621.json
velero-bundle/kubecapture/velero.io_v1/velero/serverstatusrequests-202403120111.2623.json
velero-bundle/version.txt
Data collected are:
- Velero client and server version
- Velero CRDs definition
- Velero deployment logs
- Logs and describe for specific backup (if --backup flag is provided). The describe is verbosed and include resource list
- Logs and describe for specific restore (if --restore flag is provided). The describe is verbosed and include resource list
This data is sufficient to troubleshoot related to Velero backup/restore of snapshots.
Noters on current Velero analyzer in Troubleshoot
- required to set
velero/logs
as name for podlog
collector - sample checks
Check PASS
Title: At least 1 Backup Repository configured
Message: Found 1 backup repositories configured and 1 Ready
------------
Check PASS
Title: Velero Logs analysis for kind [node-agent*]
Message: Found 1 log files
------------
Check WARN
Title: Velero logs for pod [/tmp/supportbundle3307708783/support-bundle-2024-03-12T05_13_27/velero/logs/velero-854f967b7f-btw9q/velero.log]
Message: Found error|panic|fatal in velero* pod log file(s)
------------
Check PASS
Title: Velero Logs analysis for kind [velero*]
Message: Found 6 log files
------------
Check PASS
Title: Velero Backups
Message: Found 2 backups
------------
Check PASS
Title: At least 1 Backup Storage Location configured
Message: Found 1 backup storage locations configured and 1 Available
------------
Check PASS
Title: Pod Volume Backups
Message: Found 1 pod volume backups
------------
Check PASS
Title: Velero Status
Message: Velero setup is healthy
------------
Velero troubleshooting doc
https://github.com/vmware-tanzu/velero/blob/main/site/content/docs/main/troubleshooting.md
Replicated troubleshooting doc
https://docs.replicated.com/enterprise/snapshots-troubleshooting-backup-restore