Automatic CRD Manifest Generation

Question

Automatic CRD Manifest Generation

tinyzimmer opened this issue 4 years ago · 26 comments

Problem

I personally use operator-sdk and kubebuilder a lot and one of the things I love about them is the ability to generate CustomResourceDefinition manifests from the type declarations in my gocode.

Wouldn't it be awesome if kopf could do this?! In addition to the doc/crd generation potential, kopf handlers in the future could inject populated classes when handlers run. Adding a sort of "object safety" to the mix.

Are there existing features close to solving this problem? Why don't they work?

Not that I'm aware of.

Proposal

For starters, I'm toying with what a similar CRD generation could look like in python. The idea being, along with their decorated functions, the user can provide a decorated class. An additional command such as kopf generate k8s or something can then pull in these decorated classes and use them to generate a CRD.

I have proof-of-concept code that provides the functionality outlined below. I'm happy to open a PR to try to stitch it into kopf.

@kopf.CRD(group="kopf.io", status_subresource=True)  # I have several additional options already such as defining 'scope', 'version', etc.
class RedisCluster(object):

    def spec(self):
        """
        @config -- Configuration options for redis.
        @someString -- This is an example string.
        @someBool -- This is an example bool object.
        @someInt -- This is an example integer object.
        @someList -- This is an example array object.
        """
        return {
            'config': RedisConfig,
            'someString': str,
            'someBool': bool,
            'someInt': int,
            'someList': [str]
        }

class RedisConfig(object):

    def attrs(self):
        """
        @someString -- This is an example string inside the redis config.
        """
        return {
            'someString': str
        }

print(RedisCluster.generate_k8s())  # Generates the CRD yaml

The decoration on the primary RedisCluster object (at least with what I have so far) adds an additional generate_k8s() method to the class, which is what the kopf generate k8s or whatever could call. The example above produces this:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: redisclusters.kopf.io
spec:
  group: kopf.io
  names:
    kind: RedisCluster
    listKind: RedisClusterList
    plural: redisclusters
    singular: rediscluster
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        description: RedisCluster is the Schema for the redisclusters API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: RedisClusterSpec defines the desired state of RedisCluster
            properties:
              config:
                description: Configuration options for redis.
                properties:
                  someString:
                    description: This is an example string inside the redis config.
                    type: string
                type: object
              someBool:
                description: This is an example bool object.
                type: boolean
              someInt:
                description: This is an example integer object.
                type: integer
              someList:
                description: This is an example array object.
                items:
                  type: string
                type: array
              someString:
                description: This is an example string.
                type: string
    served: true
    storage: true
    subresources:
      status: {}

Full support could include the ability to do anyOf and enums. Was a little too involved for the quick POC I put together.

So...whadya think?

Checklist

Many users can benefit from this feature, it is not a one-time case
The proposal is related to the K8s operator framework, not to the K8s client libraries

Answer 1 · 2020-08-30T11:38:31.000Z

Just for the flip side of how this could be used when reading in objects during a reconcile, I've extended the POC to offer this:

   # Assuming decorated RedisCluster like above
    obj = RedisCluster.from_dict(
        data={
            'metadata': {
                'name': 'test-cluster',
                'namespace': 'test-namespace'
            },
            'spec': {
                'config': {
                    'someString': 'test-value'
                },
                'someString': 'another-value',
                'someBool': True,
                'someList': ['hello-world']
            }
        }
    )
    print(vars(obj.spec))
    print(vars(obj.spec.config))

# Produces
# {'config': <__main__.RedisConfig object at 0x7f2a7454b820>, 'someString': 'another-value', 'someBool': True, 'someInt': None, 'someList': ['hello-world']}
# {'someString': 'test-value'}

This could potentially make for a pretty slick coding experience for the users. I could define additional methods on my RedisCluster object that I can use during runtime. For example:

@kopf.CRD(group="kopf.io")
class RedisCluster(object):
    # Required spec() definition as defined above

    def namespaced_name(self):
        return f'{self.namespace}/{self.name}'

@kopf.on.create('kopf.io', 'v1alpha1', 'redisclusters')
def create_fn(spec, meta, status, **kwargs):
    obj = RedisCluster.from_dict(data={...})  # Could potentially be done internally and passed instead of spec/meta/status
    print(obj.namespaced_name())
    # Produces: test-namespace/test-cluster

Answer 2 · 2020-08-30T21:54:23.000Z

Hello, @tinyzimmer! Thanks for this suggestion and the detailed explanation with examples.

Indeed, Kopf SDK is something I already thought about while writing piles of YAML files for CRDs, RBAC, Deployments, etc. It sometimes happens that there is more YAML code in the operator than Python code — ironically.

Such an SDK should have three main functions: generating YAMLs, validating that the YAMLs are still in sync with the code (for CI/pre-commit), and checking if the cluster is still in sync with the code's expectations (i.e. YAMLs are applied).

"In sync with the code" means that the SDK has to parse the codebase and understand what is happening there. Not only it can get the list of CRDs being monitored (that is easy), but it can also dive into the handlers and extract the CRDs created/updated/deletion (for known/supported clients) — for RBAC. I already started a PoC some time ago, based on Python's AST — but the selecting/traversing logic is awful: too low-level.

One thing I am concerned about is that the generated code stored in the repository is a code smell (my opinion). Even if it is a YAML code. However, it looks like the smallest of the evils when it comes to the CI/CD systems — as long as Python code is the source of truth, and the YAMLs are never manually edited.

Regarding the class definition, I thought to use the modern Python's capabilities like customized/parametrized classes, and dataclasses-like type annotations for fields:

import dataclasses
from typing import List, Mapping, Literal

import kopf

RedisSpecItem = Literal["hello", "world"]


@dataclasses.dataclass
class RedisConfig:
    someString: str


@dataclasses.dataclass
class RedisSpec(kopf.Spec):
    config: RedisConfig
    someDict: Mapping[str, str]
    someList: List[RedisSpecItem]
    """
    A list of shiny strings. This docstring goes to the field's description.
    The Literal[] type annotation can also restrict the OpenAPI schema to these values only.
    """

    someStr: str = 'default value'
    someInt: int = 100
    someBool: bool = False


# Inheritance ensures that .metadata & .status are also declared,
# and IDEs/type-checkers can see them natively.
class RedisCluster(kopf.Resource, group='example.com', version='v1', status_subresource=True):
    """
    This is a cluster of Redis (goes to description).
    """
    spec: RedisSpec


@RedisCluster.on.creation
def create_fn(body: RedisCluster, spec: RedisSpec, **_):
    print(spec.someList)

Such an approach would allow not only to provide the fields at runtime, but also at type-checking time (e.g. running mypy on a Kopf-based operator), and for auto-completion in IDEs.

It is a question worth investigating — how far can Python features be mapping to OpenAPI/CRD definitions, keeping the codebase as Pythonic as possible, without inventing a mini-DSL to describe OpenAPI-specific things in the docstrings.

This, however, brings Kopf to a completely new field — the API clients. Until now, I intentionally postponed this expansion, kept Kopf "client-agnostic", and recommended to use pykube-ng as the most object-oriented client (though, synchronous only, no asyncio support). But the "official" client would also work. Any other client would work too. Raw HTTPS/API calls would also work. Even kubectl as a subprocess would work.

Yet more and more often, I hit the issues even with that client, making it difficult to implement typical patterns in the operators: e.g. children object creation. It is easy, but not as easy as I would like it to be. I prefer to yield an object definition from the handler and let the framework to create/update/apply it, rather than me doing this again and again manually. (On a side-note: for this, AWS K8s CDK looks promising DSL-wise, but I didn't put my hands on it yet.)

This kind of expansion would be a big step for the framework, and it would require a lot of effort. Therefore, I prefer to first finish the remaining major features of Kopf as an operator framework (these are the last 2 big things left from my initial vision of the operator framework):

Admission hooks (validation/mutation).
Cross-resource and contextual handlers (e.g., when a pod changes that is owned by MyResource, I want both the pod's and the resource's info in the handler).

(And numerous little DevEx improvements here & there, but they don't make the story-telling.)

Once that is done, a new vision of Kopf as a K8s API client can be imagined — with tight and native integration with the existing features.

It is hard to imagine all these features now, so a narrative is needed to collect the ideas. GitHub is not very convenient for such a strategic discussion.

Answer 3 · 2020-08-31T04:26:44.000Z

Thanks for the extremely detailed write-up. Your example is far and away cleaner than what I put together, but it's in the same spirit. The extra context you provided helps me understand that there is definitely a more correct approach to take towards this idea in general. I'm definitely willing to stay tuned and try to help this project along. I still prefer go for my day-to-day activities, but every once in a while I find myself wanting to write a very small, quick and dirty automation, and scaffolding an entire operator project seems like overkill.

I pushed my POC to a fork of this repo if you want to take a peak at it anyway for your own interest. I haven't used typing in python extensively, so my code is pretty gross compared to yours. For example, I much prefer your example using dataclasses. You are right that no matter what it becomes low-level and gross. You can see the commit here: tinyzimmer@606d7cc

My addition to the CLI provides the kopf generate_k8s my_handler.py as described above.

Answer 4 · 2020-08-31T04:54:39.000Z

Just to add on something

Cross-resource and contextual handlers (e.g., when a pod changes that is owned by MyResource, I want both the pod's and the resource's info in the handler).

This is a pretty interesting feature that I don't think exists in controller-runtime either. I can watch for changes to Deployments that are owned by MyResource, but I still have to fetch the object myself from the API. The watch simply triggers my reconcile loop. You can specify custom handlers for resources, but again, you still have to fetch the object.

During a typical reconcile, you are just given the "NamespacedName" name of the object. During an admission hook you also have the body of the object, but not as it exists remotely, rather, what the request to do to it is. The code generators will lay down Reconcile functions that typically look like this (paraphrased to python):

def reconcile_object(request) -> Error:
    myResource = client.get(request.NamespacedName)
    # Insert your code here

I can see why what you describe there could be useful. But I don't fully understand why it's a required feature.

Answer 5 · 2020-08-31T05:39:42.000Z

@tinyzimmer You might be interested in my python kubernetes ORM library pykorm: https://github.com/Frankkkkk/pykorm

It hasn't yet some CRD generating code, but it's a really good feature indeed.

Cheers

Answer 6 · 2020-10-02T12:04:35.000Z

Maybe https://pydantic-docs.helpmanual.io/ offers a good starting point

Answer 7 · 2020-10-16T21:40:07.000Z

Just for the record. I've been having some fun with pydantic lately.
Maybe a more powerful alternative to dataclasses and friends.

Answer 8 · 2021-01-05T10:50:13.000Z

Just to add to this, using pydantic is IMHO definetely a good option, as it will generate an openapi-compatible schema. Pydantic can also work with Python Dataclasses, but I haven't looked into that.

Here's some working code to show how a pydantic model can be used to generate a CRD:

from pydantic import BaseModel, Field


class TestModel(BaseModel):
    name: str
    size: int
    size2: int

def create_crd():
    schema = TestModel.schema()
    body = {
        'apiVersion': 'apiextensions.k8s.io/v1',
        'kind': 'CustomResourceDefinition',
        'metadata': {'name': 'stuffs.mycompany.com'},
        'spec': {
            'group': 'mycompany.com',
            'names': {'kind': 'Stuff', 'plural': 'stuffs'},
            'scope': 'Namespaced',
            'versions': [
                {
                    'name': 'v1',
                    'schema': {'openAPIV3Schema': schema},
                    'served': True,
                    'storage': True,
                }
            ],
        },
    }
    api_extenstions_v1_api = client.ApiextensionsV1Api()
    object_does_exist = False
    try:
        print('attempting to create crd')
        api_extenstions_v1_api.create_custom_resource_definition(body)
    except client.exceptions.ApiException as e:
        print('crd already exists')
        object_does_exist = True

    if object_does_exist:
        print('updating crd')
        api_extenstions_v1_api.patch_custom_resource_definition(
            'stuffs.mohawkanalytics.com', body
        )

This would also have the benefit of allowing us to parse the body of a created/updated object into a defined object for added validation, defaults values, etc:

@kopf.on.create('mycompany.com', 'v1', 'stuffs')
def create_fn(body, **kwargs):
  test_thing = TestModel.parse(body)  # this is now a TestModel class instance

Unfortunately I'm not use pydantic models anywhere in kopf code because of #631 but hopefully that's something that can be solved.

Answer 9 · 2021-01-23T23:09:09.000Z

I've taken these ideas for a ride at https://github.com/asteven/kopf/tree/resources.

The following, taken from one of my pet projects, is now basically working:
(For context, It's a ssh host certificate manager based on vault inspired by cert-manager.)

from pydantic import BaseModel, Field
from pydantic.typing import Any, Dict, Literal, List, Mapping, Type

import kopf

class SecretRef(BaseModel):
    """Reference to a Secret of the given name.
    Optionally references the given specific `key` inside the secrets
    `data` field.
    """
    name: str
    key: str = None


class IssuerSpec(kopf.Spec):
    path: str = Field(description='The mount path of the Vault SSH backend.')
    server: str = Field(description='The connection address for the Vault server, e.g: "https://vault.example.com:8200".')
    role: str = Field(description='The vault role to use to issue certificates.')
    tokenSecretRef: SecretRef


class Issuer(kopf.Resource, group='ssh-cert-manager.io', version='v1', scope='Namespaced'):
    """A Issuer represents a vault ssh certificate authority which can be
    referenced as part of `issuerRef` fields. It is scoped to a single
    namespace and can therefore only be referenced by resources within the
    same namespace."""
    spec: IssuerSpec


class ClusterIssuer(kopf.Resource, group='ssh-cert-manager.io', version='v1', scope='Cluster'):
    """A ClusterIssuer represents a vault ssh certificate authority which can
    be referenced as part of `issuerRef` fields. It is similar to an Issuer,
    however it is cluster-scoped and therefore can be referenced by resources
    that exist in *any* namespace, not just the same namespace as the referent."""
    spec: IssuerSpec

To generate a CRD you can now do this:

import yaml
crd = ClusterIssuer.as_crd()
print(yaml.dump(crd, sort_keys=False))

The following also works in a kopf handler module.

@ClusterIssuer.on.create
@ClusterIssuer.on.resume(when=when_filter)
@ClusterIssuer.on.update
def create_cluster_issuer(name, namespace, body, meta, spec, patch, logger, **_):
    issuer = ClusterIssuer.parse_obj(body)
    print(issuer)
    print(issuer.metadata.name)
    print(issuer.spec.server)
    print(type(issuer.spec))

Still a lot to do, an no tests or docs. But it's a start ;-)

Answer 10 · 2021-01-23T23:31:38.000Z

One thing I've found is that pydantic's way of generating "linked" schemas (if TypeA contains attribute of another type), Kubernetes doesn't seem to like it (it feels to me like Kubernetes only implements part of the openapi spec). So one limitation would be that we won't be able to "anything" that pydantic allows.

Here's an example of a schema model that Kuberenetes would have problem creating a CRD from:

from pydantic import BaseModel

class SomeStuff(BaseModel):
    age: int

class What2(BaseModel):
    name: str
    foo: SomeStuff

Answer 11 · 2021-01-23T23:42:04.000Z

One thing I've found is that pydantic's way of generating "linked" schemas (if TypeA contains attribute of another type), Kubernetes doesn't seem to like it (it feels to me like Kubernetes only implements part of the openapi spec). So one limitation would be that we won't be able to "anything" that pydantic allows.

I've taken care of that by dereferencing the schema.
See https://github.com/asteven/kopf/blob/53d82e5014a2c14e761d4efcce2f05bb3ed90590/kopf/resources.py#L10

My example from above works. You can generate the crds and pipe them into kubectl.
e.g. I do this:

python my_crds.py  | kubectl apply -f -

Answer 12 · 2021-01-24T12:21:51.000Z

Super! The API/DSL of resources and their processing looks nice — as much as Python allows it to be (it is sad that there is no way to describe one model/resource in just one class, without extra classes for every sub-structure like SecretRef).

If you do not mind, I will take a closer look at this solution during the coming week or the next weekend. For today, I prefer to finish the 1.29.0 release (one bug left).

One strategic thing that I would like to pay special attention is: should this Pydantic-specific solution be part of Kopf itself? Can it be made as a separate library with special native support (and recommendation) by Kopf? Which "connection points" should Kopf provide to make it possible? Would such a library require anything from Kopf's current internals as a dependency/toolkit? How other type-annotating/crd-describing approaches can be made possible in the future?

And on a bigger scale: is it time for Kopf to explode into an ecosystem of libraries instead of one all-inclusive library?

I can imagine that not all people will be happy with Pydantic for whatever reasons, so I also keep in mind the approaches with e.g. Python StdLib's types+dataclasses only, or parsing the schemas from OpenAPI YAML, or maybe something else.

The necessity of plugins or extensions of some kind has appeared in other aspects already (e.g. K8s API authentication), so maybe it is time to think on this topic now — following the UNIX philosophy that one tool should do one thing only, but do it well.

(On a side-note: following this, I regularly rethink if anything can be extracted from Kopf as it is too heavy already. One thing that I keep in mind is pytest-specific fixtures for operator testing — but it is one class currently and is not worth extracting. Everything else is tightly coupled and cannot be separated without damaging Kopf's core value.)

PS: The separation is rather a topic to think on. I have no strong opinion on it yet. And no clear understanding of pros & cons & criteria to judge with.

Answer 13 · 2021-01-24T14:05:03.000Z

regarding

should this Pydantic-specific solution be part of Kopf

Pydantic allows "exporting" a class/schema to a openapi-compatible spec dictionary. imho this would be the natural connection point/interface between a pydantic class and kopf (basically kopf would accept any openapi spec dict, not just from pydantic). The helper method to "flatten" the openapi spec into the structure required by Kubernetes could maybe be a "util" functionin kopf.

just my 5 cents

Answer 14 · 2021-01-25T23:05:31.000Z

I was curious how much work it would be to make this true ...

@ClusterIssuer.on.create
@ClusterIssuer.on.resume(when=when_filter)
@ClusterIssuer.on.update
def create_cluster_issuer(body: ClusterIssuer, **_):
    assert type(body) == ClusterIssuer
    print('type(body): %s' % type(body))

Turns out it was trivial. asteven@e41a8ae

@nolar so far, nothing of this has to be inside the kopf package. Could just as well be in it's own package.
Not sure, but could also be better to have a batteries included approach. It doesn't help building a community and having a certain amount of users and contributors when there are 100 different ways of doing things.

Answer 15 · 2021-01-28T21:45:41.000Z

Pushed the whole lot to it's own repository for easier coding/playing/sharing.
https://github.com/asteven/kopf_resources

Answer 16 · 2021-01-31T20:19:31.000Z

So far, I've quickly read through Pydantic docs and made a little and shallow research on alternatives.

There are not so many, as it turns out: pydantic, attrs, and schematics. All others are far behind in terms of adoption. Of these three, schematics looks not updated since Dec 2018 (no single commit for 2 years). Attrs looks alive and prospering, though Pydantic is more "popular" (using GitHub stars as a proxy of popularity).

I guess we can safely embed Pydantic as the default resource modelling engine, with an easy way to switch to Attrs for those who wish (via settings, probably). There are no other libraries worth considering, or I could not find them.

Or, better, both Pydantic & Attrs should be easily configurable or supported out-of-the-box, while the default is to use pure Python dicts & co with rudimentary features.

My argument about decoupling and distantiating is therefore not relevant anymore. "Batteries included" seems a way to go.

I've now quickly scanned through your repo.

I didn't fully get what "resource caches" are, but it seems, I am currently drafting a very similar feature — see #661.

Regarding the implementation — that was a bit tricky to understand because of a lot of meta-programming hacks (inspections, dynamic fields, etc). I will need a yet another round.

Perhaps, that is because it is now separated from the framework. To safely pass through MyPy & strict type checking (mypy --strict kopf), it needs to be simplified — but it can be done later, after a proof-of-concept is ready.

Regarding the feature itself, I'd like to first extract some acceptance criteria to understand the scope of it. So, all the notes below are just to express how I understood it — please correct me if I am wrong in some places.

As I understood, the key line is here — https://github.com/asteven/kopf_resources/blob/f43a9bfdc2e572ee30bf64abc5e7c51c80925442/kopf_resources/resources.py#L93 — where it converts dicts/dict-views into Pydantic classes on every invocation.

Maybe, it can be made as an object factory configurable as e.g. cls= option to handlers (not sure about the name), which is a callback that should take a Mapping[str, Any], and return the model. The default would be to use kopf.Body — a rudimentary dict-view. For Pydantic, it should be SomeResource (we can identify Pydantic/Attrs classes internally, and call .parse_obj()). For the @SomeResource.on.creation decorators (a really nice syntax sugar to have!), that class is automatically injected into cls= by what DecoratorProxy is now.

Why not per-operator? Because it can have many handlers for different resources. And because it is one specific handler that uses that syntax. Other handlers may contain different syntax to access the resources.

Why not per-function? At least in our usage, we have a couple of places where several different CRDs are routed into one function. They have different spec schemas, but the same logic for processing the changes in status. Type annotations can use some base class/interface/protocol there, but the actual object passed will be dependant on the resource & handler, not on the function called.

A/C 1: Specifying resource classes and wrapping the objects into them. Per-handler.

Maybe there also should be a mixin which does the .on. injection magic to any class of any library (there are some decorators not in the kopf.on namespace). Currently, it is implemented in a base class Resource(BaseModel), but it does not rely on anything from BaseModel much, as I see. Is it true? And kopf.pydantic.BaseResource can be defined as class BaseResource(DecoratorMixin, BaseModel): pass, perhaps with some well-known fields (e.g. metadata).

Please also note that since Kopf 1.29.0, there is a more sophisticated system of resource selectors (docs). The previous 3-string format is supported, but only as a subset of the new selecting logic. The actual specific and concrete resources are extracted from the cluster at start-time & runtime (in kopf.reactor.observation). This SomeResource.on.event syntax sits somewhere in-between: it is both a specific resource AND a selector. I need to think here, it is not trivial.

A/C 2: Mixing-in the decorators into any class, and provide prepared Pydantic & Attrs base models with that — with cls= substituted accordingly.

One tricky part will be with daemons & timers: they cache the object in-memory, wrap into an accessor object, and "substitute" the actual data in the accessor on every event from the cluster. This is the only way how a long-running function can always have a fresh value for a resource's data, even if nothing arrives from the cluster for minutes/hours/days. Implementing a Pydantic-compatible accessor might be tricky; or not.

Besides, current body parts passed in kwargs (spec, meta, status, labels, annotations, etc) should also be converted to Pydantic sub-models of that resource, I guess. But that might be easy.

A/C 3: Live-views into Pydantic/Attrs models in daemons; and into body-parts for all handlers.

So, did I forget anything? Is everything correct in my understanding of the solution?

And thanks for the draft — it is indeed interesting to see how this can look and work when it is alive already! ;-)

Answer 17 · 2021-01-31T20:49:31.000Z

I didn't fully get what "resource caches" are, but it seems, I am currently drafting a very similar feature — see #661.

I have the following pattern in several places:
PVC references and uses StorageClass.
Certificate references and uses Issuer.
etc.

So when e.g. a PVC is created, I have to hit the API server every time to fetch the related StorageClass. Same with Certificate and Issuer.

The ResourceCache basically maps the kopf.on.{create,update,resume} handlers to cache-add.
And the kopf.on.delete handler to cache-remove.

With this in place I don't need any kubernetes client library to explicitly connect and fetch from the API server.
Kopf does all the heavy lifting for me - for free. If the StorageClass or Issuer I care about exists in the cluster, it's also in my cache.
I just get it from the cache and live happily ever after.

Not sure how generically useful this is. But was fun coding and works for me ;-)

Answer 18 · 2021-01-31T21:26:34.000Z

Regarding the implementation — that was a bit tricky to understand because of a lot of meta-programming hacks (inspections, dynamic fields, etc). I will need a yet another round.

Funny that you say that. My head is still smoking from reading the kopf source ;-) It looks way more sophisticated then what I'm used to work with or am able to write myself.

Perhaps, that is because it is now separated from the framework. To safely pass through MyPy & strict type checking (mypy --strict kopf), it needs to be simplified — but it can be done later, after a proof-of-concept is ready.

Sure. I did not care much about code quality at this point. Just hacked my way through to see it working.

Regarding the feature itself, I'd like to first extract some acceptance criteria to understand the scope of it. So, all the notes below are just to express how I understood it — please correct me if I am wrong in some places.

As I understood, the key line is here — https://github.com/asteven/kopf_resources/blob/f43a9bfdc2e572ee30bf64abc5e7c51c80925442/kopf_resources/resources.py#L93 — where it converts dicts/dict-views into Pydantic classes on every invocation.

Yes. This inspects the functions arguments and converts the resource based on type hints.

Maybe, it can be made as an object factory configurable as e.g. cls= option to handlers (not sure about the name), which is a callback that should take a Mapping[str, Any], and return the model. The default would be to use kopf.Body — a rudimentary dict-view. For Pydantic, it should be SomeResource (we can identify Pydantic/Attrs classes internally, and call .parse_obj()). For the @SomeResource.on.creation decorators (a really nice syntax sugar to have!), that class is automatically injected into cls= by what DecoratorProxy is now.

Why not per-operator? Because it can have many handlers for different resources. And because it is one specific handler that uses that syntax. Other handlers may contain different syntax to access the resources.

Why not per-function? At least in our usage, we have a couple of places where several different CRDs are routed into one function. They have different spec schemas, but the same logic for processing the changes in status. Type annotations can use some base class/interface/protocol there, but the actual object passed will be dependant on the resource & handler, not on the function called.
* **A/C 1:** Specifying resource classes and wrapping the objects into them. Per-handler.

Sounds good.

Maybe there also should be a mixin which does the .on. injection magic to any class of any library (there are some decorators not in the kopf.on namespace). Currently, it is implemented in a base class Resource(BaseModel), but it does not rely on anything from BaseModel much, as I see. Is it true? And kopf.pydantic.BaseResource can be defined as class BaseResource(DecoratorMixin, BaseModel): pass, perhaps with some well-known fields (e.g. metadata).

Yes that's true. And yes, mixin should work. I'll take a stab at that.
I just needed the values for: group, version, plural to inject into the kopf.on.* decorators.
I just need a way to get that from somewhere. Resource.__init_subclass__ gave me access to them. Will have to see if/how that works with mixins or well-known fields.

Please also note that since Kopf 1.29.0, there is a more sophisticated system of resource selectors (docs). The previous 3-string format is supported, but only as a subset of the new selecting logic. The actual specific and concrete resources are extracted from the cluster at start-time & runtime (in kopf.reactor.observation). This SomeResource.on.event syntax sits somewhere in-between: it is both a specific resource AND a selector. I need to think here, it is not trivial.

Not really sure I like that new selecting logic. Seems like to many different ways to do the same thing. I like 'one obvious way to do it'. But maybe I'm missing something or it doesn't matter/hurt anyway.

* **A/C 2:** Mixing-in the decorators into any class, and provide prepared Pydantic & Attrs base models with that — with `cls=` substituted accordingly.

Sounds good.

One tricky part will be with daemons & timers: they cache the object in-memory, wrap into an accessor object, and "substitute" the actual data in the accessor on every event from the cluster. This is the only way how a long-running function can always have a fresh value for a resource's data, even if nothing arrives from the cluster for minutes/hours/days. Implementing a Pydantic-compatible accessor might be tricky; or not.

Besides, current body parts passed in kwargs (spec, meta, status, labels, annotations, etc) should also be converted to Pydantic sub-models of that resource, I guess. But that might be easy.

If you are able to parse the body, then you should already also have all the others.

* **A/C 3:** Live-views into Pydantic/Attrs models in daemons; and into body-parts for all handlers.
So, did I forget anything? Is everything correct in my understanding of the solution?

I think you understood and covered all of it.

And thanks for the draft — it is indeed interesting to see how this can look and work when it is alive already! ;-)

My pleasure. Was fun to do.

Answer 19 · 2021-03-09T22:35:36.000Z

Maybe there also should be a mixin which does the .on. injection magic to any class of any library (there are some decorators not in the kopf.on namespace). Currently, it is implemented in a base class Resource(BaseModel), but it does not rely on anything from BaseModel much, as I see. Is it true? And kopf.pydantic.BaseResource can be defined as class BaseResource(DecoratorMixin, BaseModel): pass, perhaps with some well-known fields (e.g. metadata).

Yes that's true. And yes, mixin should work. I'll take a stab at that.

JFYI: I've rewritten the .on. injection magic to work with a mixin and/or descriptor.

There is now a example at https://github.com/asteven/kopf_resources/tree/master/example

Also started experimenting with support for different CRD versions. Generating yaml for multiple versions is working.
https://github.com/asteven/kopf_resources/blob/resource-versions/kopf_resources/

Answer 20 · 2022-01-25T21:39:34.000Z

@nolar is this still being worked on? i've been doing quite a lot of parsing to pydantic classes in my operator so having this feature built in would be a really nice feature

Answer 21 · 2022-01-25T22:56:59.000Z

There is now a example at https://github.com/asteven/kopf_resources/tree/master/example

Also started experimenting with support for different CRD versions. Generating yaml for multiple versions is working. https://github.com/asteven/kopf_resources/blob/resource-versions/kopf_resources

JFYI: All existing code has been merged to master at https://github.com/asteven/kopf_resources.

I'm using this for some projects. But it's not integrated into kopf.

Answer 22 · 2022-01-25T22:58:03.000Z

@Roni1993 Hello. Sorry, I do not work on this task. My focus is slightly in a different area, and I cannot dedicate much time to Kopf now.

But I have learned Pydantic & FastAPI — it is a really nice approach to data structures, I love it! Though, I am not sure if or when I will have time for this in Kopf.

Maybe, you can implement a separate library for declaring the CRDs via Pydantic classes, and later add support for it to Kopf — the same as kubernetes & pykube-ng models are supported now? The exact details of integration can be discussed.

Answer 23 · 2022-01-27T21:20:09.000Z

I'm gonna take a look at the lib that asteven has build it looks very promising.

I'm not sure if this question belongs here but is it possible with the kopf_resources to apply the created CRD's on the fly?
This would make for a very nice development flow where you can just edit the Model & restart kopf and immediately test it

regarding the integration: is there a PR that i can look at to figure out how py-kube is integrated now?

Answer 24 · 2022-01-27T22:18:06.000Z

I'm not sure if this question belongs here but is it possible with the kopf_resources to apply the created CRD's on the fly? This would make for a very nice development flow where you can just edit the Model & restart kopf and immediately test it

Technically that would probably be easy to do. Question is if it's a good idea.
I think creating CRD's needs way more, or at least different, privileges then a kopf operator should have at runtime.
You'd probably want a flag/switch to do this only in dev mode if at all.

Answer 25 · 2022-01-29T15:45:30.000Z

that's a valid point. I guess I'm gonna do it with a little dev script then. Thanks for the input!

Answer 26 · 2023-09-07T09:12:39.000Z

is that ok ?