planetscale/vitess-operator

vtctldclient backup not working with defined VitessBackupStorages in cluster

voarsh2 opened this issue · 5 comments

I run

./vtctldclient --server 192.168.100.103:31487 Backup --allow-primary zone1-30573399

I get:

E0914 01:23:16.047192 3349604 main.go:56] rpc error: code = Unknown desc = TabletManager.Backup on zone1-0030573399 error: unable to get backup storage: no registered implementation of BackupStorage: unable to get backup storage: no registered implementation of BackupStorage

However, in the cluster spec I have defined a hostpath for the backups - and I can see it show up in VitessBackupStorages

apiVersion: planetscale.com/v2
kind: VitessBackupStorage
metadata:
  creationTimestamp: '2023-09-14T00:52:24Z'
  generation: 1
  labels:
    backup.planetscale.com/location: ''
    planetscale.com/cluster: example
  managedFields:
    - apiVersion: planetscale.com/v2
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            .: {}
            f:backup.planetscale.com/location: {}
            f:planetscale.com/cluster: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"272e7a69-a91f-4196-ad2d-8930c88c2715"}: {}
        f:spec:
          .: {}
          f:location:
            .: {}
            f:volume:
              .: {}
              f:hostPath:
                .: {}
                f:path: {}
                f:type: {}
      manager: vitess-operator
      operation: Update
      time: '2023-09-14T00:52:24Z'
    - apiVersion: planetscale.com/v2
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:observedGeneration: {}
      manager: vitess-operator
      operation: Update
      subresource: status
      time: '2023-09-14T00:53:06Z'
  name: example-90089e05
  namespace: vitess
  ownerReferences:
    - apiVersion: planetscale.com/v2
      blockOwnerDeletion: true
      controller: true
      kind: VitessCluster
      name: example
      uid: 272e7a69-a91f-4196-ad2d-8930c88c2715
  resourceVersion: '233908598'
  uid: 24068990-be71-49b5-ad17-773f581170a9
spec:
  location:
    volume:
      hostPath:
        path: /mnt/minio-store/vitess-backups
        type: Directory
status:
  observedGeneration: 1

I restarted the primary and saw "--file_backup_storage_root=/vt/backups/example" was added, but this is not the path I specified in the cluster.

Hi @voarsh2,

This is not the best place to try and get support/help. You should instead use the Vitess slack as this kind of thing requires a lot of back and forth: https://vitess.io/community/ There are also many people there from the community that are using the operator in production.

I don't know anything about your setup (k8s version, vitess operator version, etc), nor what you've done -- e.g. the VitessCluster CRD definition you used. Nor what you want to do (how you want the backups to be performed).

It's clear that something isn't quite right but w/o any details I cannot say what.

In the meantime you can find the CRD/API reference here: https://github.com/planetscale/vitess-operator/blob/main/docs/api.md

You can see some example walkthroughs here: https://github.com/planetscale/vitess-operator/tree/main/docs

And a blog post: https://vitess.io/blog/2020-11-09-vitess-operator-for-kubernetes/

And the Vitess backup docs: https://vitess.io/docs/17.0/user-guides/operating-vitess/backup-and-restore/

The backups are very configurable and again I have no idea what you've specified. At the Vitess level, the error you shared is Vitess telling you that the component (vtctld,vttablet,vtbackup) has no value for its --backup_storage_implementation flag. What backup implementation are you trying to use, e.g. file, s3, ceph...: https://github.com/planetscale/vitess-operator/tree/main/pkg/operator/vitessbackup ?

Between k8s (each install is a snowflake), Vitess, and the Vitess Operator this gets complicated. This is why Slack is easier for things like this. I know that this is complicated for you as well, and the docs are largely non-existent for the operator, but we'd need much more detail in order to try and help.

I can only guess that perhaps you specified something like this in your CRD:

spec:
  backup:
    engine: xtrabackup
    locations:
    - volume:
        hostPath:
          path: /backup
          type: Directory

But guessing doesn't help. 🙂 After knowing the actual CRD definition, we'd have to look at the pod definitions, logs, etc.

Best Regards

Howdy @mattlord

This is not the best place to try and get support/help. You should instead use the Vitess slack as this kind of thing requires a lot of back and forth: https://vitess.io/community/ There are also many people there from the community that are using the operator in production.

Will look to try Slack next time.

As you pointed out:

spec:
  backup:
    engine: xtrabackup
    locations:
    - volume:
      hostPath:
        path: /mnt/minio-store/vitess-backups
        type: Directory

This is what I used for the Vitess Cluster config.

I've read most of those links.
The problem is now, despite the hostpath, the DB pods have --file_backup_storage_root=/vt/backups/example in the command args. Not the path I specified.

So, when running ./vtctldclient --server 192.168.100.103:31487 BackupShard commerce/- I get:

rpc error: code = Unknown desc = TabletManager.Backup on zone1-2469782763 error: StartBackup failed: mkdir /vt/backups/example: permission denied: StartBackup failed: mkdir /vt/backups/example: permission denied

Notice it's not using the hostpath I specified in the Cluster configuration.

VitessBackupStorage

apiVersion: planetscale.com/v2
kind: VitessBackupStorage
metadata:
  labels:
    backup.planetscale.com/location: ""
    planetscale.com/cluster: example
  name: example-90089e05
  namespace: vitess
spec:
  location:
    volume:
      hostPath:
        path: /mnt/minio-store/vitess-backups
        type: Directory

VitessCluster: example

apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/3zPwY6sIBCF4Xeptdoq0gjb+w69L4oizR0EIzWdSTq++8SZ/SzPv/iS8wbc04OPlmoBB3vGwtIIMw9Ut9trhg4+Ugng4JGEW/uXP5vwAR1sLBhQENwbsJQqKKmWds3q/zNJYxmOVAdCkcxDqrd0OQFjGLUx/R1Z9cuiYo8jh54mY+JiVz8vFs4OMnrOf3JPbE9wgGQU0TytaraavPKaFCkzTlGrqNXk7ervs50utODG4IC/cNszw29oO9JVXz8P4Ty/AwAA//+KvyL+FgEAAA
    objectset.rio.cattle.io/id: dafd0577-6ae3-443f-a0ed-c177f498b249
  labels:
    objectset.rio.cattle.io/hash: ac73cc2183295cb3b5c3c3701f53f531b98b6291
  name: example
  namespace: vitess
spec:
  backup:
    engine: xtrabackup
    locations:
    - volume:
        hostPath:
          path: /mnt/minio-store/vitess-backups
          type: Directory
  cells:
  - gateway:
      authentication:
        static:
          secret:
            key: users.json
            name: example-cluster-config
      replicas: 3
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
    name: zone1
  images:
    mysqld:
      mysql80Compatible: vitess/lite:latest
    mysqldExporter: prom/mysqld-exporter:v0.11.0
    vtadmin: vitess/vtadmin:latest
    vtbackup: vitess/lite:latest
    vtctld: vitess/lite:latest
    vtgate: vitess/lite:latest
    vtorc: vitess/lite:latest
    vttablet: vitess/lite:latest
  keyspaces:
  - durabilityPolicy: semi_sync
    name: commerce
    partitionings:
    - equal:
        parts: 1
        shardTemplate:
          databaseInitScriptSecret:
            key: init_db.sql
            name: example-cluster-config
          tabletPools:
          - cell: zone1
            dataVolumeClaimTemplate:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
            mysqld:
              resources:
                requests:
                  cpu: 100m
                  memory: 512Mi
            replicas: 3
            type: replica
            vttablet:
              extraFlags:
                db_charset: utf8mb4
                disable_active_reparents: "true"
              resources:
                requests:
                  cpu: 100m
                  memory: 256Mi
  - durabilityPolicy: semi_sync
    name: betawonder3
    partitionings:
    - equal:
        parts: 1
        shardTemplate:
          databaseInitScriptSecret:
            key: init_db.sql
            name: example-cluster-config
          tabletPools:
          - cell: zone1
            dataVolumeClaimTemplate:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
            mysqld:
              resources:
                requests:
                  cpu: 500m
                  memory: 512Mi
            replicas: 1
            type: replica
            vttablet:
              extraFlags:
                db_charset: utf8mb4
                disable_active_reparents: "true"
              resources:
                requests:
                  cpu: 100m
                  memory: 256Mi
    turndownPolicy: Immediate
  updateStrategy:
    type: Immediate
  vitessDashboard:
    cells:
    - zone1
    extraFlags:
      security_policy: read-only
    replicas: 1
    resources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  vtadmin:
    apiAddresses:
    - http://192.168.100.103:31252
    apiResources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
    cells:
    - zone1
    rbac:
      key: rbac.yaml
      name: example-cluster-config
    readOnly: false
    replicas: 1
    webResources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi

https://github.com/planetscale/vitess-operator/blob/main/docs/api.md#planetscale.com/v2.VitessBackup - this doesn't show hostpath as a valid option, but I saw it in the sample YAML of the operator. In any case, I can't see any obvious reason why this is not working - not sure why the DB pods are using /vt/backups when I specify a different path. I might give S3 a try next......

I looked at the code and here is what I found -
The operator is using the volume configuration provided in the yaml to create a volume called vitess-backups.

func fileBackupVolumes(volume *corev1.VolumeSource) []corev1.Volume {
	return []corev1.Volume{
		{
			Name:         fileBackupStorageVolumeName,
			VolumeSource: *volume,
		},
	}
}

Next, Vitess mounts this said volume on a fixed hardcoded path in the vtbackup and vtctld pod. The path that is used is /vt/backups.

func fileBackupVolumeMounts(subPath string) []corev1.VolumeMount {
	return []corev1.VolumeMount{
		{
			Name:      fileBackupStorageVolumeName,
			MountPath: fileBackupStorageMountPath,
			SubPath:   subPath,
		},
	}
}

Since the volume has been mounted on the path /vt/backups, this is what is used in the flags for vtctld and vtbackup -

func fileBackupFlags(clusterName string) vitess.Flags {
	return vitess.Flags{
		"backup_storage_implementation": fileBackupStorageImplementationName,
		"file_backup_storage_root":      rootKeyPrefix(fileBackupStorageMountPath, clusterName),
	}
}

So while taking a backup, vtbackup will try to create a directory with the cluster name, in your case example, and then take a backup there.

☝️ explains why you are seeing /vt/backups in the error messages, because the vtctld and vtbackup binaries have the volume mounted at this directory.

Unfortunately, I don't know why the volume mount is unaccessible rpc error: code = Unknown desc = TabletManager.Backup on zone1-2469782763 error: StartBackup failed: mkdir /vt/backups/example: permission denied: StartBackup failed: mkdir /vt/backups/example: permission denied.
One possible reason could be that maybe the volume /mnt/minio-store/vitess-backups doesn't allow all users to create a directory inside it 🤷‍♂️ Maybe only the root user is permitted to create a directory. Could you try changing the permissions on this or try using a different directory that doesn't have this problem? Even in the e2e test that Vitess runs to verify that backups are working properly, we have to run mkdir -p -m 777 ./vtdataroot/backup to change the permissions on the backup directory we mount to allow all users to create directories inside it.