kubernetes/enhancements

CustomResourceDefinitions

deads2k opened this issue ยท 127 comments

Enhancement Description

Scope of work planned for v1.15

  • Define allowed OpenAPI subset (#1002, #692)
  • Define and perform scale testing for CRD (#1015)
  • Bring CRD conversion webhooks to beta (#1004, #598)

Scope of work planned for v1.11

Scope of work planned for v1.10

Scope of work planned for v1.9

Scope of work planned for v1.8

  • Remove deprecated ThirdPartyResource API.
  • Add validation and defaulting for CustomResourceDefinition.
  • Add subresources for CustomResourceDefinition.
    • Support Spec/Status split (/status subresource) on custom resources.
    • Support incrementing object Generation on custom resource data mutation (requires Spec/Status split).
  • Support OwnerReference-based garbage collection with CRD.

Scope of work planned for v1.7

  • Move TPR to a new API group (tentatively called apiextensions) to support deprecation of the extensions group
    • Ideally, implement the new TPR in a separate API server, to be integrated into kube-apiserver via API Aggregation.
  • For now, only allow 1 version at a time per TPR. In the absence of conversion (which is out of scope for this release), this is necessary to remain consistent with the expectations of other components.
    • Support for multiple versions could be added (with or without conversion) in a later release.
  • Fix name conflicts due to lossy conversion of TPR name into resource/kind.
  • Allow TPRs to specify their own names for resources and kinds, rather than tying them to the TPR name.
  • Allow TPRs to register short names that will be discoverable by kubectl.
  • Allow TPRs to optionally be cluster-scoped rather than namespaced.
  • Define and document a process to migrate from extensions/v1beta1 TPR, possibly requiring brief downtime for TPR custom controllers and operators.
    • Where possible, provide automated tools to help with migration.
  • A finalizer ensures CR data is deleted if a CRD is deleted.
  • Fix TPR/CRD data cleanup upon namespace deletion for the 3rd time, this time with a regression test.

Other plans not in scope for this release

  • Support multiple versions at the same time for a given TPR.
    • Other components (e.g. GC, namespace finalizers) expect automatic conversion. TPR currently does not support that.
    • Note that it's possible to change the single registered version of a TPR, but it requires brief downtime for TPR custom controllers and operators.
    • The extensions/v1beta1 TPR gives the appearance of supporting multiple versions, but multiple version support was never implemented.
  • Support customizing where TPR APIs appear in discovery, relative to other TPRs or other APIs.
  • Support namespace-scoped CRD whose CRs are only visible in one namespace.

Plans with unclear status

Still investigating or TBD. Please comment/edit with any updates.

  • Improve the display of TPRs in kubectl/dashboard.
    • There may be other feature trackers addressing this.

@lavalamp I've created this to try to have a place where we can at least consolidate our thoughts and track progress on third party resources. I've tried to create a list of known shortcomings to be resolved before promotion to stable.

I don't have an owner in mind, but recognition of the problem seems like step 1.

@deads2k I am learning third party resource recently, also wish to help with something.

@deads2k I am learning third party resource recently, also wish to help with something.

I've re-ordered the list in terms of what I see as tactical priority. People are trying to use this now and these problems will burn them badly.

If you're comfortable taking the "multiple resources" item, that would be a great start. You could create a separate issue and we can talk about implementation in there.

@deads2k I spent some time trying to reproduce the first issue:

Multiple Resources, single version, different add times - Adding resource A, waiting for it to appear, then adding resource B fails. Resource B is never added.

but with unluck. Below is my reproduce steps:

  1. create a custom thirdparty resource&wait it to appear
[root@localhost kubernetes]# cat /home/tony/Desktop/debug/lbclaim.yaml
kind: ThirdPartyResource
apiVersion: extensions/v1beta1
metadata:
  name: loadbalancerclaim.k8s.io
description: "Allow user to claim a loadbalancer instance"
versions:
- name: v1
[root@localhost kubernetes]# kc create -f /home/tony/Desktop/debug/lbclaim.yaml
thirdpartyresource "loadbalancerclaim.k8s.io" created
[root@localhost kubernetes]# curl  http://localhost:8080/apis/extensions/v1beta1/thirdpartyresources/
{
  "kind": "ThirdPartyResourceList",
  "apiVersion": "extensions/v1beta1",
  "metadata": {
    "selfLink": "/apis/extensions/v1beta1/thirdpartyresources/",
    "resourceVersion": "170"
  },
  "items": [
    {
      "metadata": {
        "name": "loadbalancerclaim.k8s.io",
        "selfLink": "/apis/extensions/v1beta1/thirdpartyresources/loadbalancerclaim.k8s.io",
        "uid": "dcb88b3a-9857-11e6-a19b-08002767e1f5",
        "resourceVersion": "146",
        "creationTimestamp": "2016-10-22T13:03:01Z"
      },
      "description": "Allow user to claim a loadbalancer instance",
      "versions": [
        {
          "name": "v1"
        }
      ]
    }
  ]
}
  1. after a moment(more than 10s), create another custom thirdparty resource
[root@localhost kubernetes]# cat /home/tony/Desktop/debug/loadbalancer.yaml
kind: ThirdPartyResource
apiVersion: extensions/v1beta1
metadata:
  name: loadbalancer.k8s.io
description: "Allow user to curd a loadbalancer instance"
versions:
- name: v1
[root@localhost kubernetes]# kc create -f /home/tony/Desktop/debug/loadbalancer.yaml
thirdpartyresource "loadbalancer.k8s.io" created
  1. verify both resources exist
[root@localhost kubernetes]# curl http://localhost:8080/apis/k8s.io/v1/
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "k8s.io/v1",
  "resources": [
    {
      "name": "loadbalancerclaims",
      "namespaced": true,
      "kind": "Loadbalancerclaim"
    },
    {
      "name": "loadbalancers",
      "namespaced": true,
      "kind": "Loadbalancer"
    }
  ]
}
[root@localhost kubernetes]# kc get loadbalancers
No resources found.
[root@localhost kubernetes]# kc get loadbalancerclaims
No resources found.

seems we already support multiple resources, single version.

And I take a deep look at TPR related code. The thirdparty_controller will do periodically sync(every 10 seconds), it will install every new TPR, and also do some deletion job. The ThirdPartyResourceServer contains all installed TPR mappings. As we can see from SyncOneResource and InstallThirdPartyResource, even this this group exists, it will still update the group with the new API.

Also I found that I am able to delete a TPR schema def even there are TPR instances in the system. I think this should not be allowed.

@deads2k I spent some time trying to reproduce the first issue:

Try to enable this test: https://github.com/kubernetes/kubernetes/blob/master/test/integration/thirdparty/thirdparty_test.go#L137 . If it works, we're good. If it fails, something is wrong.

@deads2k Hi David, please take a look at the message I sent on Slack. Besides, I add a fix to the failed integration test, the third party resource controller will remove the corresponding routes handler when a TPR get deleted, this will help with the integration test, but I am not sure whether this will bring in any other problems.

For problem #1, it was fixed here:

kubernetes/kubernetes#28414

@brendandburns actually not, you can run the comment out integration test, and it will fail.

@brendandburns More correctly, we did support multiple resources, single version, but the deletion logical has some problem.

@adohe did you file an issue? I can take a look.

@brendandburns you can see here:

https://github.com/kubernetes/kubernetes/blob/master/test/integration/thirdparty/thirdparty_test.go#L137 

enable this test, and you will see it will fail. I have tried to fix this on my local, and I will open a PR later today.

@brendandburns I am afraid I don't file an issue.

Also ref kubernetes/kubernetes#32306 (TPR should be deleted when namespace is deleted)

@deads2k can you update the checklist ?

@deads2k can you update the checklist ?

All issues still outstanding. This is actually a feature to track the problems in the (already) beta thirdparyresources implementation from 1.3. We needed a place to keep track of our problems, but had to devote energy to other efforts in 1.5.

@deads2k I am already working on Multiple Resources, single version and Multiple versions, I think a lot of code need to be update.

@deads2k does still feature still target 1.5?

@idvoretskyi I am afraid not :(

@deads2k: ThirdPartyResources should be added to federated APIs.

rmohr commented

@deads2k: Currently field selectors are not working when querying for ThirdPartyObjects, is that something for your list?

@deads2k @rmohr kubectl still has many outstanding capabilities against TPR, list above should be updated to track these.

@deads2k: Currently field selectors are not working when querying for ThirdPartyObjects, is that something for your list?

That's a more general problem of inconsistent field selector support across all API types.

I'm starting to look at this as well. ThirdPartyResources are very important to supporting "external" controllers like spark, and before we can add things like sub-resources, we should be fixing this.

Field selectors only work on hand-curated fields in the regular API objects. I would not expect them to work for any fields in TPRs--apiserver isn't built to do arbitrary queries. If you need that behavior TPR will not work for you.

Is the next step here to move the TPRs into an addon API server?
It seems like there are some outstanding PRs out to fix some of the issues in the list here which may be blocked on this item.

/cc @liggitt @deads2k @adohe

sttts commented

To get the complexity down of TPRs in the apiserver code and to make the TPR logic much more explicit, I would definitely vote for a standalone tpr-apiserver. But IMO this does not really block any of the fixes.

I'm adding some items about handling API semantics (get, list, watch, update, patch) when dealing with multiple non-convertible Kinds. I think that probably needs a design document, since the semantics are unlikely to match normal API semantics.

I'll take (yet another) run at fixing some of these issues...

kubernetes/kubernetes#40260 and kubernetes/kubernetes#40096 will get us in decent shape on the kubectl side

The most severe server-side issue at the moment is the garbage collector losing its mind over ownerRefs that point to TPRs.

Once we get that resolved, we should decide what the API semantics around multiple versions of a given TPR should be, and make sure the TPR type has the data we need. That's likely to affect the server-side storage impl, so I'd rather nail the design down before we do too much server-side work.

@liggitt I'll take a look at reviewing those. thx

Does anyone have a pointer to how to refer to TPRs in RBAC rules? I have a TPR named like foo-bar.something.example.com. As a cluster admin I can get a list of foobars in a given namespace with kubectl get foobars.

When a regular user tries the same thing they get Error from server (Forbidden): the server does not allow access to the requested resource (get foobars.something.example.com).

I've tried every variation of foobar, foo-bar, etc. that I can think of in an RBAC rule with no luck so far.

In the rule, you're looking for resource=foobars apigroup=something.example.com verb=get,list,watch

@deads2k That did the trick. Thanks!

@liggitt

The most severe server-side issue at the moment is the garbage collector losing its mind over ownerRefs that point to TPRs.

anything related with the TPR cleanup issue?

No, it was an issue with the garbage collector not knowing how to look up ownerRefs to anything other than compiled in types. The reverse issue exists as well, with the garbage collector not paying attention to finalizers on anything other than compiled-in types.

Both of those garbage collector issues are distinct from the need to clean up ThirdPartyResourceData objects reliably when the ThirdPartyResource object is removed.

@liggitt Thanks for patient explanation, so what's the plan of TPR in 1.6?

Some of the open issues relating to TPR. Not exhaustive.

Group/version problems: kubernetes/kubernetes#24299, kubernetes/kubernetes#36977
Watch: kubernetes/kubernetes#25340
Self link: kubernetes/kubernetes#37622
Namespace deletion: kubernetes/kubernetes#37554
GC: kubernetes/kubernetes#39816
Finalizers: kubernetes/kubernetes#40715
Cleanup of TPR data: kubernetes/kubernetes#35949
Stronger validation of metadata: kubernetes/kubernetes#22768 (comment)
Lack of unit tests: kubernetes/kubernetes#40956
Cleanup: kubernetes/kubernetes#36998

Features that users think are bugs because they work for other resources:
Async behavior: kubernetes/kubernetes#29002
Integers: kubernetes/kubernetes#30213
YAML: kubernetes/kubernetes#37455
Decent kubectl output: kubernetes/kubernetes#31343
Simplify resource naming: kubernetes/kubernetes#29415
Apply: kubernetes/kubernetes#29542, kubernetes/kubernetes#39906
Edit: kubernetes/kubernetes#35993

/cc

Subscribing as we are trying to handle TPRs in Dashboard.

Tracking issues are kubernetes/dashboard#1671 and kubernetes/dashboard#1504.

@kubernetes/dashboard-maintainers

What's the status/plan for non-namespaced TPR? I did not find discussions about it, maybe missed something?

@sttts To start, I am intrigued by the development at Kubernetes. And I want to contribute to it, but Go is a new language for me. What you guys recommend me doing so that I can get this project for GSoC 2017?

To add something about me, I am fairly good at C++ and Java and I hold Bachelors in Computer Science. I have also started reading the documentation and took Udacity course involving Kubernetes.

sttts commented

@grpndrs we have a list of labeled issues which are a good starting point to get into the code: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Afor-new-contributors. Feel free to contact me in slack and we can go through a few of them.

Is Multiple Resources, single version, different add times still an issue? I can both create and delete multiple TPRs without a problem.

Also, can we number the checkboxes in Outstanding Capabilities so it's easier to refer to? @deads2k I think you can do it like so:

1. - [ ] ...
2. - [ ] ...

Does anyone know how the validation component of this is coming along? I work with TPRs a lot and this feature would be priceless and save A LOT of custom code. I'd love to contribute to this feature but would like to know if anyone subscribed to this issue knows it's status

Does anyone know how the validation component of this is coming along?

I don't expect it to happen for 1.7. At the moment, we're discussing some structural growing pains here kubernetes/community#524 to provide a more stable base to grow upon.

I don't expect it to happen for 1.7. At the moment, we're discussing some structural growing pains here kubernetes/community#524 to provide a more stable base to grow upon.

We plan to move forward with https://github.com/kubernetes/community/blob/master/contributors/design-proposals/thirdpartyresources.md in the 1.7 timeframe. I'll make updates here and in the sig-apimachinery calls as we move along.

@deads2k I didn't see anything in there about tpr validation. Wouldn't you consider that to be something that would be needed for beta?

sttts commented

@frankgreco the proposal is about a sound foundation for TPRs to build upon. Features like validation can be added later, but are out of scope here.

I've edited the parent comment of this thread to use the new template, and to clarify the scope of work planned for 1.7, as I understand it. Please look over it and fix/comment.

@deads2k @enisoc We are recently starting to use TPR, and TPR validation is going to be pretty critical to some of our upcoming projects. If we have the resource to work on it, would you consider accepting community contributors to make it happen?

@deads2k @enisoc We are recently starting to use TPR, and TPR validation is going to be pretty critical to some of our upcoming projects. If we have the resource to work on it, would you consider accepting community contributors to make it happen?

Absolutely. For something like this, we'd want a design proposal before we start looking at pull requests. Also, given how many different approaches are possible, I'd suggest that you list the top three or so ideas and give a brief explanation of why the one you choose is the best. Since its server side, performance and security considerations are very important.

Also, since this is a far reaching feature, it's important that it doesn't become a drive-by contribution. Active contributions (reviews, tests, code, migration) for the transition to https://github.com/kubernetes/community/blob/master/contributors/design-proposals/thirdpartyresources.md would help. I'm deads2k on slack if you're interested and want to talk.

Thanks @deads2k! That's totally reasonable. We'll come up with some design proposals for TPR validation, what is best way of sharing it? I'll get to slack as well

sttts commented

@xiao-zhou we are happy to have an accepted Google Summer of Code project around this very topic (was announced just yesterday). Let's chat on Slack about how to collaborate on this. Very cool that you are interested in this as well, so we have quite some force to push this forward!

@xiao-zhou @sttts @deads2k as soon you've a proposal for TPR validation (and ideally defaulting) mind tag me in the proposal review? Thanks

sttts commented

@sdminonne it will be posted in sig-apimachinery. If you subscribe to that google group, you should get notified.

@sttts thanks

@deads2k are you going to add ObservedGeneration for TPRs?

kubernetes/kubernetes#7328 (comment)

@deads2k are you going to add ObservedGeneration for TPRs?

I wasn't planning to. Couldn't a client which cares simply compare spec and status names?

compare spec and status names?

Not sure what you mean here. Correct me If I am wrong but I think there are two parts re ObservedGeneration: 1) the API server needs to update metadata.generation every time there is an update in the Spec of the TPR and 2) the controller responsible for the TPR updates status.observedGeneration based on metadata.Generation. I guess 1) is what I am asking you and 2) is something that TPR authors need to take care of?

Not sure what you mean here. Correct me If I am wrong but I think there are two parts re ObservedGeneration: 1) the API server needs to update metadata.generation every time there is an update in the Spec of the TPR and 2) the controller responsible for the TPR updates status.observedGeneration based on metadata.Generation. I guess 1) is what I am asking you and 2) is something that TPR authors need to take care of?

Oh, I misunderstood which thing you were asking about. You want observedGeneration for the CustomResource, not the CustomResourceDefinition. I thought that observedGeneration was only bumped for changes to spec that required action. Meaning that an update to metadata didn't trigger it and an update to some spec fields could avoid bumping it as well.

ash2k commented

In my comment linked above I was asking for Generation support for TPR instances, not for TPRs themselves (although that would be nice too. Any reasons to not add it to all objects?).

E.g. if I have Kind: TPR; name: foo.example.com and instance of that TPR Kind: Foo; name: foo123, I'm interested in Generation/ObservedGeneration for foo123 so that Foo controller can let Foo consumers know if it has processed an update to foo123 instance. Does it make sense? I don't see how this can be achieved without proper support on the k8s server side.

Yeah, generation/observedGeneration makes sense for the user schema of the TPR and not for the actual TPR resource as it has evolved.

@Kargakis The rule is to only increment object generation on spec update, not status, right? If so it means we first need to officially support Spec/Status split on the TPR instance. I was planning to write a proposal for TPR Status, targeting 1.8. I can make sure to include incrementing object generation in the proposal.

The rule is to only increment object generation on spec update, not status, right?

Correct.

If so it means we first need to officially support Spec/Status split on the TPR instance.

Yeah, I expected to find that split as part of the existing issue but it seems there is more work that needs to happen before we get there..

@Kargakis I've edited the top-level comment to mention these items, although they are out of scope for 1.7.

/cc

@deads2k Should we add a shortname for CustomResourceDefinition?

Added shortname CRD for CustomResourceDefinition.

A design proposal for validation of CustomResources: kubernetes/community#708 ๐Ÿ˜„

@deads2k @enisoc @lavalamp
was wondering if the user can configure k8s controller AND(OR) CURD methods for CRD objects

In my particular use-case I create a networks.stable.example.com CRD & use it to create Network object net1:

I need to ensure a new Network CRD object is not allowed to be created if a Network CRD object with an overlapping subnet range already exists

If such mechanism does not exist, I will be happy to put some thoughts together in a design doc.

As mentioned in the 1.7 release notes and docs, TPR is now deprecated and we plan to remove it in 1.8. Users should switch to CRD during the 1.7 timeframe.

Please comment on the tracking issue for removal if you have any questions or concerns.

Updates/Plans for 1.8:

  • Support JSON Schema based validation and defaulting for CustomResources (proposal)
  • Add sub-resources (like status and scale) for CRs (proposal to be out soon proposal)

Thanks @nikhita. I've edited the top comment to reflect 1.8 plans.

ash2k commented

Discovery returns correct information for CRs but REST mapper does not use it - kubernetes/kubernetes#49948

Proposal for SubResources for CustomResources: kubernetes/community#913 ๐ŸŽ‰

Please forgive my mis-post, but I came to this page from some other kubernetes page thinking that kubernetes included a micro services framework, beyond just for managing third party container resources.

Redhat markets OpenShift kubernetes as a micro-services platform, but yet, I can't seem to find this feature. I'm looking for an application server like thing, to host my own suite of very light-weight independent application micro-services.

Does such a thing exists, or are we relegated to creating fat java war apps in springboot and deploy them on a tomcat server that sits inside a kuberenetes managed container, that's hard to manage and difficult to deploy. I need a micro-services platform where 1 administrator can manage and operate 100s of micro-services.

Does this question make sense?

@hectoralicea this repo is used for planning features worked on by Kubernetes developers.

For general questions like this, please post to the Kubernetes user groups. They're usually much more helpful for this kind of high level discussion :)

See https://groups.google.com/forum/#!forum/kubernetes-users, http://slack.k8s.io/, or Stack Overflow.

sttts commented

@colemickens @deads2k @nikhita @enisoc I have added a section for 1.9.

luxas commented

@sttts Improved beta version in v1.9, right?

sttts commented

@luxas bugfixes of course. But I don't think we have to list that here.

luxas commented

@sttts I was thinking about the CRD validation... is that covered in this features issue and will graduate to beta in v1.9 or?

@luxas from the initial post

Scope of work planned for v1.9

    CRD validation to beta kubernetes/kubernetes#22768 kubernetes/kubernetes#53829
    CRD sub-resources as alpha kubernetes/community#913
luxas commented

Oh, thanks @Kargakis, didn't look there ๐Ÿคฆ ๐Ÿ˜„

@deads2k, @enisoc no plans for "stable" in 1.9, right?

@deads2k ๐Ÿ‘‹ Please open a documentation PR and add a link to the tracking spreadsheet. Thanks in advance!

@deads2k Please open a documentation PR and add a link to the tracking spreadsheet. Thanks in advance!

@zacharysarah I seem to have misplaced the spreadsheet link. Docs for CRD validation here kubernetes/website#6066

For the record, the CRD versioning issue exists here: #544.

List of tasks for CRDs moving to GA: kubernetes/kubernetes#58682

@nikhita does it mean that entire CRD feature is moving to GA?

sttts commented

does it mean that entire CRD feature is moving to GA?

The API will move to GA, i.e. to v1, possibly with some beta/alpha sub-features though. It is not terminated though when this will happen, i.e. whether 1.10 is feasible.

@sttts @nikhita can you define the feature roadmap more precisely?

can you define the feature roadmap more precisely?

For 1.10:

There is no exact set of deliverables planned for the next releases but the plan is to go GA by the end of the year (https://groups.google.com/forum/#!topic/kubernetes-sig-api-machinery/07JKqCzQKsc).

We will go to GA once all the issues that are not crossed out in kubernetes/kubernetes#58682 will be complete.

When the CRD api goes GA, there might be features in it (example: CustomResourceValidation https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/features/kube_features.go#L35) that could be in alpha/beta.

@sttts @nikhita @deads2k
Any plans for this in 1.11?

If so, can you please ensure the feature is up-to-date with the appropriate:

  • Description
  • Milestone
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

cc @idvoretskyi

Any plans for this in 1.11?

I don't have permissions to edit the PR body (if someone can do that, it'd be great!). But the plan is:

If so, can you please ensure the feature is up-to-date with the appropriate:
Description

The one-line description should be updated to include "Add validation, defaulting, subresources and versioning for CRDs".

Design proposals mentioned in the description needs to include:

Can someone please add these in the PR body as well?

Labels:

/kind feature

sttts commented

Can someone please add these in the PR body as well?

done