ydb-platform/ydb-kubernetes-operator

feat: Improve operator for day-2 tasks

Opened this issue · 1 comments

Feature Request

Describe the Feature Request
Operator could enable day-2 workflows declaratively.
Some of the issues are called out in the docs with [1] but they seem to be fairly limited. I have been using CrunchyData's postgres operator, and looked into CloudNativePG operator as well, and they offer a pretty good blueprint of what could work.

  1. Backups. Following the example from crunchydata, backup section could take a secret for S3 credentials, frequency of how often to take a full back up and an incremental backup, and other configuration for PITR
  backups:
    pgbackrest:
      configuration:
      - secret:
          name: pgo-gcs-creds
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.47-0
      manual:
        options:
        - --type=full
        repoName: repo1
      repos:
      - name: repo1
        s3:
          bucket: <BUCKET>
          endpoint: <ENDPOINT>
          region: <REGION>
        schedules:
          full: 0 7 */2 * *
  1. Storage size increase
    Crunchy allows for increasing # of replicas and storage size for each of the instances. Based on YDB docs, it's not possible to update the manifest and apply changes. I have not tested if manually increasing PersistentVolumeClaim size works, but doing this at manifest would be idea.

  2. Affinity
    I believe operator supports node affinity / through the CRD, but it's not called in documentation, so one has to check the CRD definition. An example would probably also suffice.

  3. Users/Databases
    This is not a strict must, and Database is a separate CRD object, so it may be moot somewhat, but crunchy encapsulates database and users in same CRD.

  4. Similar to (4), Crunchy encapsulates monitoring (exporter) and administraive UI (pgadmin) in a single CRD object, could work better.

[1] https://ydb.tech/docs/en/getting_started/kubernetes#:~:text=The%20cluster%20configuration%20is%20static

Describe Preferred Solution
More capable CRD would help. CLI is fine, but hard with devops.

If the feature request is approved, would you be willing to submit a PR?
No. Not sure i know the tool enough to build the operator on it.

@tunatoksoz thank you for the feedback!

We have been thinking about the improvements 1/2 as well, even though it is not high on our list of priorities. Backups in larger YDB installations are currently managed by a separate control plane component, which is likely to be opensourced in the near future as well, so we'd like to give it some time to see where the borders of responsibility lie between the operator and the said control plane component.

The one suggestion that is the easiest to fix would be 3, I will make sure the documentation includes an example on how to set up node affinity (because it is indeed supported).

The 4/5 ones are unlikely to change in the immediate future, as you've said, it's not a strict must. However the 4 idea does sound appealing to me, I'll make sure it reaches the team and we'll give it a better thought.