0/1 Completed Lingering Jobs

Question

0/1 Completed Lingering Jobs

coffee-time-design opened this issue 2 months ago · 0 comments

coffee-time-design commented 2 months ago

What steps did you take and what happened:
I updated the Helm charts across my k3s clusters, everything seemed to be fine, then I noticed a few days later that job pods were not going away after the job had run:
job.batch/cert-manager-default-kopia-j6thr-maintain-job-1721254904505 1/1 6s 3h59m
job.batch/cert-manager-default-kopia-j6thr-maintain-job-1721258868468 1/1 12s 173m
job.batch/cert-manager-default-kopia-j6thr-maintain-job-1721263194387 1/1 3m20s 101m

velero cert-manager-default-kopia-j6thr-maintain-job-17212588684659hdd 0/1 Completed 0 170m
velero cert-manager-default-kopia-j6thr-maintain-job-1721263194385q8f4 0/1 Completed 0 98m

What did you expect to happen:
I expected these objects to clear after completion.

Anything else you would like to add:
When I look at the logs of those pods, there are some long errors, that start with:
time="2024-07-17T23:27:55Z" level=warning msg="active indexes
and end with:
logSource="pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" sublevel=error
But ultimately:
time="2024-07-17T23:27:57Z" level=info msg="Finished quick maintenance." logModule=kopia/kopia/format logSource="pkg/kopia/kopia_log.go:94" logger name="[shared-manager]"

But everything appears to be working just fine, backups are being taken.
The above errors also appear in the controller pod.
I am backing up to MinIO

Environment:
Helm Chart Values:

initContainers:
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:v1.10.0 # Versions are here: https://github.com/vmware-tanzu/velero-plugin-for-aws
    imagePullPolicy: IfNotPresent
    volumeMounts:
      - mountPath: /target
        name: plugins
configuration:
  backupStorageLocation:
  - name: default
    provider: aws
    bucket: cluster-velero
    credential:
      name: cluster-velero
      key: cloud
    config:
      region: us-east-1
      s3ForcePathStyle: "true"
      s3Url: "http://minio:9000"
  volumeSnapshotLocation:
  - name: default
    provider: aws
    config:
      region: us-east-1
  defaultVolumesToFsBackup: true
credentials:
  useSecret: true
  existingSecret: cluster-velero
deployNodeAgent: true
nodeAgent:
  resources:
    requests:
      cpu: 20m
      memory: 64Mi
  tolerations:
    - effect: "NoExecute"
      operator: "Equal"
      value: "true"
      key: "CriticalAddonsOnly"
schedules:
  cluster:
    disabled: false
    schedule: "0 0 * * *"
    template:
      ttl: "2160h"  # 90 days, expressed in hours
      storageLocation: default
      includedNamespaces:
      - "*" # Backup everything

helm version (use helm version): v3.13.1
helm chart version and app version (use helm list -n <YOUR NAMESPACE>): velero-7.1.1 1.14.0
Kubernetes version (use kubectl version):
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5+k3s1
Kubernetes installer & version:
Cloud provider or hardware configuration: Baremetal
OS (e.g. from /etc/os-release): Ubuntu 23.10