`plndr-cp-lock` failure with kube-vip pod when backing up K3s sqlite database

Question

`plndr-cp-lock` failure with kube-vip pod when backing up K3s sqlite database

Opened this issue 3 months ago · 0 comments

I'm trying to use Litestream to backup the db of my 3-node k3s cluster running on Raspberry Pi 4s. It's a single primary node cluster with two worker nodes.
I've written Ansible code to deploy the litestream systemd service to the primary node and backup to Backblaze. I can confirm that the service runs, connects to my bucket and starts replicating to it.
However, I notice that certain pods start restarting when it runs. I've tried different --sync-intervals with no success.
Specifically, the most detrimental of these is the kube-vip pod that dies with the following errors:

E0228 20:52:15.433056       1 leaderelection.go:369] Failed to update lock: Put "https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock?timeout=10s": context deadline exceeded
I0228 20:52:15.434181       1 leaderelection.go:285] failed to renew lease kube-system/plndr-cp-lock: timed out waiting for the condition
error: http2: client connection lost

During the pod restart process, I lose access to the VIP and API. I do not see any errors in the logs for litestream when I run journalctl -xf -u litestream, just replication logs. However, upon writing this I realize that by default, log level is set to INFO. Perhaps I can temporarily change that and observe. But in the meantime, I want to see if anyone else in the community has seen this, and if there are any solutions I can try.

I'm running the latest tag of kube-vip, v0.7.1

Below is my current (redacted) config:

access-key-id: [redacted]
secret-access-key: [redacted]

dbs:
  - path: /var/lib/rancher/k3s/server/db/state.db
    replicas:
      - type: s3
        bucket: mybucket
        endpoint: s3.us-xxxx-xxx.backblazeb2.com
        path: litestream
        force-path-style: true
        sync-interval: 30s