0/1 Completed Lingering Jobs
coffee-time-design opened this issue · 0 comments
What steps did you take and what happened:
I updated the Helm charts across my k3s clusters, everything seemed to be fine, then I noticed a few days later that job pods were not going away after the job had run:
job.batch/cert-manager-default-kopia-j6thr-maintain-job-1721254904505 1/1 6s 3h59m
job.batch/cert-manager-default-kopia-j6thr-maintain-job-1721258868468 1/1 12s 173m
job.batch/cert-manager-default-kopia-j6thr-maintain-job-1721263194387 1/1 3m20s 101m
velero cert-manager-default-kopia-j6thr-maintain-job-17212588684659hdd 0/1 Completed 0 170m
velero cert-manager-default-kopia-j6thr-maintain-job-1721263194385q8f4 0/1 Completed 0 98m
What did you expect to happen:
I expected these objects to clear after completion.
Anything else you would like to add:
When I look at the logs of those pods, there are some long errors, that start with:
time="2024-07-17T23:27:55Z" level=warning msg="active indexes
and end with:
logSource="pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" sublevel=error
But ultimately:
time="2024-07-17T23:27:57Z" level=info msg="Finished quick maintenance." logModule=kopia/kopia/format logSource="pkg/kopia/kopia_log.go:94" logger name="[shared-manager]"
But everything appears to be working just fine, backups are being taken.
The above errors also appear in the controller pod.
I am backing up to MinIO
Environment:
Helm Chart Values:
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.10.0 # Versions are here: https://github.com/vmware-tanzu/velero-plugin-for-aws
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
configuration:
backupStorageLocation:
- name: default
provider: aws
bucket: cluster-velero
credential:
name: cluster-velero
key: cloud
config:
region: us-east-1
s3ForcePathStyle: "true"
s3Url: "http://minio:9000"
volumeSnapshotLocation:
- name: default
provider: aws
config:
region: us-east-1
defaultVolumesToFsBackup: true
credentials:
useSecret: true
existingSecret: cluster-velero
deployNodeAgent: true
nodeAgent:
resources:
requests:
cpu: 20m
memory: 64Mi
tolerations:
- effect: "NoExecute"
operator: "Equal"
value: "true"
key: "CriticalAddonsOnly"
schedules:
cluster:
disabled: false
schedule: "0 0 * * *"
template:
ttl: "2160h" # 90 days, expressed in hours
storageLocation: default
includedNamespaces:
- "*" # Backup everything
- helm version (use
helm version
): v3.13.1 - helm chart version and app version (use
helm list -n <YOUR NAMESPACE>
): velero-7.1.1 1.14.0 - Kubernetes version (use
kubectl version
):
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5+k3s1 - Kubernetes installer & version:
- Cloud provider or hardware configuration: Baremetal
- OS (e.g. from
/etc/os-release
): Ubuntu 23.10